The benchmark chart OpenAI posted on Wednesday has a line that sits roughly twice as high as every competitor. It's the FrontierMath Tier 4 bar. GPT-5.5 lands at 35.4%. Claude Opus 4.7 — the model most serious engineers were running for research workloads — sits at 22.9%.
Then, at the bottom of the page, there's the pricing table. $5 input, $30 output per million tokens. That's exactly double what GPT-5.4 cost when it shipped six weeks ago.
What actually landed
OpenAI introduced GPT-5.5 on April 23, calling it the first fully retrained base model since GPT-4.5. The model is rolling to Plus, Pro, Business, and Enterprise tiers in ChatGPT. GPT-5.5 and GPT-5.5 Pro hit the API on April 24.
The benchmark haul is unusual in its breadth. On Terminal-Bench 2.0, which measures end-to-end agentic coding, GPT-5.5 posts 82.7% — enough to narrowly beat Anthropic's Claude Mythos Preview. On GDPval, which tests knowledge work across 44 occupations, it reaches 84.9%. On OSWorld-Verified, which tests whether a model can operate a computer on its own, it scores 78.7%. On Tau2-bench Telecom, a customer-service workflow benchmark, 98.0% — without any prompt tuning.
Then GPT-5.5 Pro, the research variant, scores 39.6% on FrontierMath Tier 4. Nearly double Claude Opus 4.7's 22.9%. For a benchmark that tests problems at the edge of what human mathematicians can solve, this is not a small delta.
Where Claude still wins
On SWE-Bench Pro, which measures realistic software engineering tasks on open-source repositories, Claude Opus 4.7 still posts 64.3% to GPT-5.5's 58.6%. So the picture isn't "OpenAI won." It's more interesting than that. GPT-5.5 cleaned up math and agentic computer use. Claude still owns the code-review seat.
OpenAI's engineers claim GPT-5.5 uses about 40% fewer output tokens than GPT-5.4 to finish the same Codex task. If true, the price-per-task is effectively higher by roughly 20% — not double. That's the spin OpenAI would like developers to internalize. The sticker shock hides underneath a productivity claim no one has independently verified yet.
Why the pricing is the real story
For two years the story was: frontier models get cheaper every six months, following some version of a Moore's Law for inference. That story is over. GPT-5.5 is the first major flagship release in three years where the headline price went up rather than down.
OpenAI is betting capability gains justify premium pricing. They might be right. Developers running research workloads where FrontierMath-style reasoning matters will pay anything for the best model. But this doubles the cost of every GPT-5 workflow that was already working. Every enterprise that budgeted a GPT-5.4 contract for 2026 just watched their model bill double if they upgrade.
My Opinion
I'll be blunt. The benchmark numbers look genuinely impressive. A 12-point lead on Tier 4 FrontierMath isn't marketing — that test was designed by actual Fields Medal winners to be unsolvable, and it moves in single-digit increments between model releases. If GPT-5.5 Pro really hits 39.6%, that's the single biggest jump in frontier math reasoning I've seen since GPT-5.
Here's what bugs me about the rollout. OpenAI has a strategic reason to price high right now: Anthropic is crushing them on enterprise revenue, DeepSeek V4 shipped open-source this morning at a fraction of the cost, and the capex demand for training these models is absurd. So they're charging what the market will bear. I understand the business logic. What I don't love is the framing — "more expensive but uses fewer tokens, so it's actually a deal." That's PR-speak. The honest version: we raised prices because our model is better and our investors need revenue.
The move I'd watch next is what Anthropic does on pricing when Claude Opus 5 ships. If they hold the current $15/$75 Opus pricing and wait six months to match on capability, they hand developers an obvious arbitrage on SWE-Bench workloads. If they raise prices in lockstep, we have an oligopoly confirmed. Either way, the era of cheaper-every-quarter is finished. Welcome to the new compute economy — your AI bill now grows with model quality, not in spite of it.
Author: Yahor Kamarou (Mark) / www.humai.blog / 24 Apr 2026