OpenAI Launches EVMbench to Test AI on Smart Contract Exploits
-

OpenAI has introduced a new benchmark called EVMbench to evaluate how effectively AI agents can detect, patch, and even exploit vulnerabilities in smart contracts. Developed in collaboration with investment firm Paradigm and security specialist OtterSec, the framework tested models against 120 curated smart contract vulnerabilities drawn from real audit competitions.
Among the top performers was Anthropic’s Claude Opus 4.6, which achieved the highest average “detect award,” followed by OpenAI’s OC-GPT-5.2 and Google’s Gemini 3 Pro. The goal, according to OpenAI, is to measure AI performance in economically meaningful environments — especially as smart contracts secure billions in crypto assets and AI agents increasingly operate in financial systems.