Top AI Models Struggle as Automation Hype Meets Reality
-

Among the systems tested were major industry names, including OpenAI’s GPT-5 and ChatGPT Agent, Anthropic’s Claude Sonnet 4.5, Google’s Gemini 2.5 Pro, and xAI’s Grok 4. Despite bold claims about “PhD-level” intelligence and advanced coding abilities, none of the models surpassed a 2.5% automation rate. Gemini 2.5 Pro ranked last with just 0.8%, while GPT-5 managed 1.7%.
The findings raise fresh doubts about the rush to replace workers with AI. Separate research from Massachusetts Institute of Technology has shown that most companies piloting AI saw no meaningful revenue growth, and many reported increased low-quality output requiring heavy revisions. While AI firms continue pitching agents as workforce replacements, this latest benchmark suggests that — at least for now — human freelancers remain far more productive and adaptable.