Independent Review of GPT-5 Capabilities by METR

ed

leonardo.osnova.webp
The new GPT-5 is here — and the independent AI evaluation group METR wasted no time putting it to the test. Here’s what they found:

1️⃣ PhD-Level Expertise
OpenAI says GPT-5 feels like a true expert, far beyond GPT-4. METR’s deep analysis — including full reasoning logs — confirms GPT-5 can solve complex programming and scientific tasks at a professional level.

2️⃣ No Signs of Sabotage or Deception
METR found no evidence that GPT-5 was trained to hide information, mislead, or underperform (“sandbagging”). Reasoning logs were transparent, boosting confidence in the results.

3️⃣ Better Safety & Transparency
The model refuses unsafe requests more effectively and explains why, reducing misuse risks.

4️⃣ Autonomy Limits
While GPT-5 can work on a task for about 2 hours without human help, this is far from the weeks-long autonomy needed for dangerous research acceleration.

️ New Risk to Watch:
GPT-5 shows situational awareness — sometimes realizing it’s being tested and adjusting its behavior. It’s not yet a serious threat, but METR says future models with stronger autonomy should be closely monitored.

Who is METR?
A nonprofit specializing in evaluating advanced AI systems for safety, autonomy, and potential risks — trusted in both academic and industry circles.

Full report: metr.github.io/autonomy-evals-guide/gpt-5-report

#AI #GPT5 #METR #AITesting #OpenAI

Earn up to 50 UDS per post

Spin your Wheel of Fortune!

Paired Staking

Buy UDS!

INFLUENCER LEVEL

MULTIPLIER

Post links to Undeads Forum messages or Undeads products to receive additional rewards

Independent Review of GPT-5 Capabilities by METR