Top AI Models Struggle as Automation Hype Meets Reality

tradelikepro

Among the systems tested were major industry names, including OpenAI’s GPT-5 and ChatGPT Agent, Anthropic’s Claude Sonnet 4.5, Google’s Gemini 2.5 Pro, and xAI’s Grok 4. Despite bold claims about “PhD-level” intelligence and advanced coding abilities, none of the models surpassed a 2.5% automation rate. Gemini 2.5 Pro ranked last with just 0.8%, while GPT-5 managed 1.7%.

The findings raise fresh doubts about the rush to replace workers with AI. Separate research from Massachusetts Institute of Technology has shown that most companies piloting AI saw no meaningful revenue growth, and many reported increased low-quality output requiring heavy revisions. While AI firms continue pitching agents as workforce replacements, this latest benchmark suggests that — at least for now — human freelancers remain far more productive and adaptable.

madtrader

this is why benchmarks matter. marketing claims and real-world performance are two very different things

Earn up to 50 UDS per post

Spin your Wheel of Fortune!

Paired Staking

Buy UDS!

INFLUENCER LEVEL

MULTIPLIER

Post links to Undeads Forum messages or Undeads products to receive additional rewards

Top AI Models Struggle as Automation Hype Meets Reality