Advanced AI Models Still Fall Short of Real Work
-

Despite claims of near-human intelligence, newer models like GPT-5 and ChatGPT Agent showed limited capability in completing practical freelance work, scoring just 1.7% and 1.3% respectively. Gemini 2.5 Pro ranked last with only 0.8%, underscoring how even advanced systems struggle with execution beyond controlled environments.
The study introduced the “Remote Labor Index,” a benchmark designed to evaluate whether AI can deliver economically valuable output. Results indicate that current AI lacks key human abilities such as continuous learning, long-term memory, and adaptability, which are critical for completing multi-step or evolving tasks in real-world workflows.