AI Still Struggles With Real-World Work Tasks
-

A new study tested how well AI can perform tasks already completed by humans, and the results weren’t flattering. Researchers assigned AIs projects from remote freelance work platforms, including game development, product design, architecture, data analysis, and video animation. These tasks, which took human workers over 100 hours and $10,000 to complete, were used as a benchmark called the Remote Labor Index (RLI) to gauge AI effectiveness in real-world, economically valuable projects.
The findings were clear: contemporary AI systems perform poorly on creative, complex tasks. Manus fared the best with only a 2.5% automation rate, followed by Grok 4 and Sonnet 4.5 at 2.1%. GPT-5 managed 1.7%, ChatGPT agent 1.3%, and Gemini 2.5 Pro lagged behind at 0.8%. Researchers concluded that AI currently fails to complete the vast majority of projects at a quality level acceptable for commissioned work, highlighting that fears of immediate mass job replacement may be overblown.
-
lowkey reassuring that creative complex projects still need humans