AI Failed 97% of Freelancer Tasks, New Study Finds
Researchers have tested AI on freelance projects in several areas, including game development and data analysis. The results have been disappointing.
Researchers have tested AI on freelance projects in several areas, including game development and data analysis. The results have been disappointing.
Researchers have tested AI on freelance projects in several areas, including game development and data analysis. The results have been disappointing.
To find out whether artificial intelligence is capable of executing projects as effectively as humans, a group of researchers gave work tasks to AI models such as Manus, Grok 4, Sonnet 4.5, GPT-5, ChatGPT agent, and Gemini 2.5 Pro, writes ZDNET.
These tasks have previously been successfully completed by real freelancers in industries such as game development, product design, architecture, data analysis, and video animation. The tasks involved creating:
The tasks listed above covered various levels of complexity, cost $10,000, and took over 100 hours of real-world time to complete.
To compare the capabilities of AI automation and the real work of freelancers, researchers developed the Remote Labor Index (RLI) evaluation system.
“While AI systems already pass many existing tests, we found that even the most advanced AI agents perform at the base level within the RLI,” the researchers reported. “The best model achieved only 2.5% automation. This proves that current AI systems are unable to perform the vast majority of projects at the level of quality that is acceptable for custom work.”
Manus showed the best results with a performance score of 2.5%. Grok 4 and Sonnet 4.5 shared the scores at 2.1%, GPT-5 was next at 1.7%, and the ChatGPT agent at 1.3%. Gemini came in last with 0.8%.
One of the researchers, Dan Hendricks, admitted that while modern AI is smart, it is still not very useful, given the overall automation rate of less than 3%.
Explaining the reasons for this failure, Hendricks noted that many AI capabilities remain deficient. AIs are unable to learn directly on the fly because they lack long-term memory. In addition, AI's visual skills are limited, although they were necessary for many tasks.
The testing specifically included tasks that required a fairly high level of skill. It is likely that the AI would have handled other types of work and projects much more easily.
“While absolute automation rates are currently low, our analysis suggests that models are steadily improving and progress on these complex tasks is measurable,” the researchers note. “This creates a common basis for tracking the trajectory of AI-driven automation, allowing stakeholders to adapt to its implications early.”



