OpenAI and Paradigm have introduced EVMbench, a new benchmarking framework designed to evaluate the ability of AI agents to detect, patch, and exploit blockchain vulnerabilities.
OpenAI and Paradigm officially launched EVMbench to address security risks in smart contracts that secure over $100 billion in crypto assets. The benchmark utilizes 120 curated vulnerabilities from 40 professional audits, including scenarios from the Tempo blockchain, to test Artificial Intelligence (AI) capabilities in a sandboxed Ethereum Virtual Machine ( EVM) environment.
The system evaluates agents across three distinct modes: detection of vulnerabilities, functional patching of code, and end-to-end execution of fund-draining exploits. Recent testing shows that the GPT-5.3-Codex model achieves a 72.2% success rate in exploit tasks, marking a significant increase from the 31.9% score recorded by GPT-5 just six months ago.
“Measuring model capability in this domain helps track emerging cyber risks and highlights the importance of using AI systems defensively to audit and strengthen deployed contracts,” according to the OpenAI announcement.
🧭 FAQs
• What is the primary purpose of the EVMbench framework? It measures how effectively AI agents identify and resolve high-severity smart contract vulnerabilities.
• Which organizations collaborated to develop this new security benchmark? OpenAI and the crypto investment firm Paradigm co-developed the EVMbench testing environment.
• How does the system verify if an agent successfully patches code? Automated tests ensure vulnerabilities are eliminated without breaking the contract’s intended functional logic.
• Is there financial support available for researchers using these tools? OpenAI is committing $10 million in API credits to support defensive cybersecurity research.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。