Наталя Хандусенко AI Eng 20 January 2026, 15:44

AI Failed 97% of Freelancer Tasks, New Study Finds

Researchers have tested AI on freelance projects in several areas, including game development and data analysis. The results have been disappointing.

Leave a comment

AI Failed 97% of Freelancer Tasks, New Study Finds

Researchers have tested AI on freelance projects in several areas, including game development and data analysis. The results have been disappointing.

To find out whether artificial intelligence is capable of executing projects as effectively as humans, a group of researchers gave work tasks to AI models such as Manus, Grok 4, Sonnet 4.5, GPT-5, ChatGPT agent, and Gemini 2.5 Pro, writes ZDNET.

These tasks have previously been successfully completed by real freelancers in industries such as game development, product design, architecture, data analysis, and video animation. The tasks involved creating:

an interactive dashboard for exploring data from the World Happiness Report;
a brewing-themed version of the game "Watermelon," where players combine falling items to reach the highest-level item;
3D animation to demonstrate the features and design of the new headphones and charging case;
A 2D animated video promoting the offers of a company providing free services;
develop architectural plans and a 3D model of a container house based on an existing PDF project;
format a document, using the provided functions and equations, for an IEEE conference.

The tasks listed above covered various levels of complexity, cost $10,000, and took over 100 hours of real-world time to complete.

To compare the capabilities of AI automation and the real work of freelancers, researchers developed the Remote Labor Index (RLI) evaluation system.

“While AI systems already pass many existing tests, we found that even the most advanced AI agents perform at the base level within the RLI,” the researchers reported. “The best model achieved only 2.5% automation. This proves that current AI systems are unable to perform the vast majority of projects at the level of quality that is acceptable for custom work.”

Manus showed the best results with a performance score of 2.5%. Grok 4 and Sonnet 4.5 shared the scores at 2.1%, GPT-5 was next at 1.7%, and the ChatGPT agent at 1.3%. Gemini came in last with 0.8%.

One of the researchers, Dan Hendricks, admitted that while modern AI is smart, it is still not very useful, given the overall automation rate of less than 3%.

Explaining the reasons for this failure, Hendricks noted that many AI capabilities remain deficient. AIs are unable to learn directly on the fly because they lack long-term memory. In addition, AI's visual skills are limited, although they were necessary for many tasks.

The testing specifically included tasks that required a fairly high level of skill. It is likely that the AI would have handled other types of work and projects much more easily.

“While absolute automation rates are currently low, our analysis suggests that models are steadily improving and progress on these complex tasks is measurable,” the researchers note. “This creates a common basis for tracking the trajectory of AI-driven automation, allowing stakeholders to adapt to its implications early.”

Which areas of Ukrainian business use AI the most and why - research

AI makes knowledge superficial and here's why

Memes make AI 23% dumber: models make mistakes more often and retain context worse

Read the country's main IT news in our Telegram

Leave a comment

Text: Наталя Хандусенко Tags: ai, study

Found an error in the text? Highlight it and press Ctrl+Enter. Found an error in the text? Highlight it and press the 'Report an error' button.

Розміщення реклами

Advertising Placement

Roosh запускає нову освітню платформу AI HOUSE CLUB для ML/AI-спеціалістів та дата сайнтистів. Розповідаємо, як подати заявку та чому навчатимуть

Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua

Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент. Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.

У TikTok тепер можна генерувати фон за допомогою нейромережі. Ми протестували її та ділимося результатами

У TikTok з’явилася нова функція «Розумний фон». З її допомогою як фон для тіктоків можна підставляти згенеровані нейромережею зображення. Редакція dev.ua протестувала цю технологію і ділиться своїми враженнями.

1 comment

Які IT-спеціальності будуть потрібні в найближчі п'ять років? Ми з'ясували у голови американського стартапу ADAM Дениса Гурака

Have important news to share? Message our Telegram bot

Key events and useful links in our Telegram channel

No comments yet.

Sign in to leave a comment