Market Updates

Markets

Your Weekend Crypto Roundup May 2026 #5

29 May 2026

Markets

Your Weekend Crypto Roundup May 2026 · Week 4

22 May 2026

Markets

Binance’s Shift to Orderbook Pricing Raises Questions on Market Stability in Off-Hours Trading

20 May 2026

Markets

Why the SEC Stalled Prediction Market ETFs and the Stakes for Crypto Derivatives

20 May 2026

Markets

Why Prediction Markets Persist in the Netherlands Despite Polymarket Ban

20 May 2026

Events

Chain of Thoughts

America’s Crypto U-Turn: The SEC Finally Gets It Right

27 March 2026

The Nation-State FOMO: Are Strategic Bitcoin Reserves Genuine Policy or Political Theatre?

28 December 2025

The Centralization Paradox: How Structural Forces Pull Crypto Back to Gatekeepers

29 November 2025

SocialFi and the Tokenization of Influence

31 October 2025

The Aesthetics of Web3: Why Vibe Matters in Decentralized Communities

27 September 2025

Home Explore Artificial Intelligence

OpenAI, Paradigm Launch EVMbench to Test AI on Crypto Smart Contract Security

OpenAI has rolled out a new benchmarking system aimed at measuring how effectively artificial intelligence agents can identify and repair security flaws in crypto smart contracts.

Announced on February 18, the system called EVMbench was developed in collaboration with crypto investment firm Paradigm. It focuses on applications built for the Ethereum Virtual Machine (EVM), the engine that powers many blockchain-based financial applications.

Introducing EVMbench—a new benchmark that measures how well AI agents can detect, exploit, and patch high-severity smart contract vulnerabilities. https://t.co/op5zufgAGH

— OpenAI (@OpenAI) February 18, 2026

OpenAI said the tool was designed to reflect real-world financial risks, noting that smart contracts currently safeguard more than $100 billion in open-source crypto assets. As AI systems grow more advanced, the company said, it is critical to evaluate how they perform in high-stakes security environments.

Benchmarking AI against real-world vulnerabilities

EVMbench tests AI agents across three core functions: detecting vulnerabilities, patching flawed code and executing simulated exploits. The dataset includes 120 high-severity issues drawn from 40 previous security audits, many sourced from public audit competitions.

Additional case studies were incorporated from reviews of the Tempo blockchain, a payments-focused network built for stablecoin transactions, to better reflect financial use cases.

To simulate attacks safely, OpenAI adapted existing exploit scripts and built new ones where necessary. All tests are conducted in isolated environments, and only previously disclosed vulnerabilities are used.

In detection mode, AI models analyze contract code to flag known weaknesses. In patch mode, they must correct those flaws without disrupting functionality. In exploit mode, agents attempt to drain funds from vulnerable contracts within a controlled sandbox.

Early results highlight strengths and limits

Using a custom-built evaluation framework, OpenAI tested several advanced models. In exploit scenarios, GPT-5.3-Codex scored 72.2%, a sharp increase from GPT-5’s 31.9% result six months earlier. However, detection and patching tasks proved more challenging, underscoring the complexity of nuanced code reviews. In 2025, OpenAI rolled back the GPT-4o update after ChatGPT became overly agreeable.

Researchers found that AI agents performed best when objectives were explicit, such as extracting funds, but struggled with broader analytical tasks involving large codebases.

The company said the benchmark is intended to strengthen defensive cybersecurity efforts. Alongside the release, OpenAI pledged $10 million in API credits to support open-source security initiatives and made EVMbench’s tools and datasets publicly available for further research.

Enjoyed this piece? Bookmark DeFi Planet, explore related topics, and follow us on Twitter, LinkedIn, Facebook, Instagram, Threads and CoinMarketCap Community for seamless access to high-quality industry insights.