DeepSeek V4 Is 0.2 Points Behind Claude Opus. And 7x Cheaper. And Open Source.

DeepSeek dropped V4 this morning — 1.6 trillion parameters, 80.6% on SWE-bench, 7x cheaper than Claude. And open source. The compute moat didn't hold. Or it held for fourteen months.

Friday morning in Hangzhou. DeepSeek pushed V4 to its API and open-sourced the weights. By lunchtime in San Francisco, every AI engineer with a benchmark suite was stress-testing the numbers.

They held up.

V4-Pro is a 1.6 trillion parameter Mixture-of-Experts model with 49 billion parameters activated per forward pass. The smaller sibling, V4-Flash, ships at 284 billion total with 13 billion active. Both come with a one-million-token context window. Both are free to download, modify, and self-host.

On SWE-bench Verified — the coding benchmark that actually simulates real engineering work — V4-Pro scored 80.6%. Claude Opus 4.6 scored 80.8%. That is a 0.2-point gap. A rounding error.

On MMLU, V4-Pro hit 88.4%. On the new Humanities-X reasoning test, 92.1%. Numbers that drop V4 squarely into the conversation with GPT-5 and Claude Opus 4.6 — except those models cost roughly $25 per million output tokens. V4-Pro charges $3.48. That's a seven times price gap at near-identical coding performance.

Open source. You can run the weights on your own hardware. No rate limits. No terms of service that change every quarter.

Every American lab spent the last twelve months telling the world that compute is the moat. That the $500 billion data center buildout is the wall protecting frontier labs from fast followers. DeepSeek just shipped a model that matches their best on coding, reasoning, and world knowledge — at a price that doesn't require a hyperscaler to subsidize it.

The moat didn't hold. Or it held for fourteen months.

My Opinion

Here's what bugs me. The American AI story through 2025 was that scale equals advantage, and that advantage compounds. The story in 2026 is shaping up very differently. A Chinese lab with a fraction of the training budget just matched Claude Opus on SWE-bench within a rounding error — and then gave the weights away so anyone with an H100 cluster can run it.

I think Anthropic and OpenAI will be fine for now. They have distribution, enterprise contracts, safety reputations, and brand trust that DeepSeek cannot buy in six months. But the margin math just got ugly. If you are a CTO picking a coding model today, "7x cheaper at 99.75% the performance, self-hostable, no vendor lock-in" is not a hard pitch. A lot of seat licenses are going to migrate between now and the end of the year.

The deeper point — the one the American AI press keeps dancing around — is that open-source frontier models are no longer a year behind. They are a rounding error behind on the benchmarks that matter, at a fraction of the cost. The question for closed labs isn't whether they can stay on top. It's whether staying on top is even worth what it costs anymore. Frontier labs used to compete with each other. Now they're competing with free.

Author: Yahor Kamarou (Mark) / www.humai.blog / 24 Apr 2026

Mark

AI Strategy & Transhumanism Researcher Exploring the intersection of human evolution, AI consciousness, and productivity optimization. Author of 100+ guides on AI tools, workflow automation, and the future of human enhancement at HumAI.blog.