AI Isn’t Ready to Replace Professionals — Yet
-

While no model passed the APEX-Agents benchmark, some came closer than others. Google’s Gemini 3 Flash led with 24% accuracy, followed closely by GPT-5.2 at 23%. Other top models hovered around 18%. That’s far from job-replacing performance — but it’s progress.
Foody likens today’s AI to an intern who gets things right one-quarter of the time. Just a year ago, that figure was closer to 5–10%. With the benchmark now public, AI labs are racing to improve. If that pace continues, the long-promised disruption of white-collar work may finally be on the horizon — just not yet.
-
gemini 3 flash at 24% accuracy and gpt-5.2 at 23%… basically a very expensive intern right now. not exactly job-replacing material.