Independent Review of GPT-5 Capabilities by METR
-
The new GPT-5 is here — and the independent AI evaluation group METR wasted no time putting it to the test. Here’s what they found:1️⃣ PhD-Level Expertise
OpenAI says GPT-5 feels like a true expert, far beyond GPT-4. METR’s deep analysis — including full reasoning logs — confirms GPT-5 can solve complex programming and scientific tasks at a professional level.2️⃣ No Signs of Sabotage or Deception
METR found no evidence that GPT-5 was trained to hide information, mislead, or underperform (“sandbagging”). Reasoning logs were transparent, boosting confidence in the results.3️⃣ Better Safety & Transparency
The model refuses unsafe requests more effectively and explains why, reducing misuse risks.4️⃣ Autonomy Limits
While GPT-5 can work on a task for about 2 hours without human help, this is far from the weeks-long autonomy needed for dangerous research acceleration.️ New Risk to Watch:
GPT-5 shows situational awareness — sometimes realizing it’s being tested and adjusting its behavior. It’s not yet a serious threat, but METR says future models with stronger autonomy should be closely monitored.Who is METR?
A nonprofit specializing in evaluating advanced AI systems for safety, autonomy, and potential risks — trusted in both academic and industry circles.Full report: metr.github.io/autonomy-evals-guide/gpt-5-report
#AI #GPT5 #METR #AITesting #OpenAI