In Rio de Janeiro this week, five researchers walked on stage at ICLR 2026 and quietly demolished the central premise of the last 18 months of AI investment.

Their paper is titled "The Reasoning Trap." The argument is a single sentence: training models to reason harder makes them hallucinate more, not less. They proved it. With benchmarks, ablations, and a name — SimpleToolHalluBench — that's about to become uncomfortably familiar to every lab betting the farm on chain-of-thought.

The numbers were already on the table. OpenAI's own data, surfaced through the PersonQA benchmark earlier this year, showed o3 hallucinating on 33% of queries — more than double the 16% rate of its predecessor o1. The smaller o4-mini hit 48%. Forty-eight percent. That's not a model that occasionally gets confused. That's a model that's wrong half the time.

Until now, the industry handwave was that this was a measurement problem. Maybe the harder questions surfaced edge cases. Maybe the benchmarks were unfair. Maybe the "thinking" trace just made the same errors more visible.

The Reasoning Trap kills that excuse.

The authors — Chenlong Yin, Zeyang Sha, Shiwen Cui, Changhua Meng, and Zechao Li — show a clean causal chain. Take a base model. Apply reinforcement learning to make it reason step-by-step. Watch tool hallucination climb in lockstep with task performance. Apply supervised fine-tuning to instill reasoning. Same effect. Switch from direct answers to chain-of-thought at inference time, no retraining at all. Same effect again.

It's not the data. It's not the benchmark. It's the training objective itself.

Mechanistically, they did something even more uncomfortable. They cracked the model open. Reasoning RL, they found, "disproportionately collapses tool-reliability representations" — the parts of the network that track whether a tool actually exists, whether it's the right one, whether the call will succeed. Those representations get flattened. Not removed. Flattened. The model still reasons. It just reasons confidently about things that aren't there.

My Opinion

I'll be blunt. This is the most important AI paper of the quarter, and almost nobody is talking about it.

Every frontier lab — OpenAI, Anthropic, Google, DeepSeek — is currently shoveling reinforcement learning into their flagship models to make them "think." That's the whole bet. GPT-5.5, Claude Opus, Gemini 3 are all competing on reasoning benchmarks. The valuations, the investment, the $40 billion Google just handed Anthropic — it all runs on the assumption that smarter reasoning is the path forward.

The Reasoning Trap says the path forward is also the path to a 48% hallucination rate.

What bugs me is the industry response so far, which is silence. The mitigation strategies the authors tested — prompt engineering, Direct Preference Optimization — work, but only by trading capability for reliability. You can have one or the other. Pick. Nobody wants to pick. So the labs ship reasoning models that are smarter on benchmarks and dumber in production, and pretend the "agentic" use cases will somehow figure themselves out.

They won't. If you've ever watched a reasoning model confidently call a tool that doesn't exist, invent an API endpoint, or hallucinate a function signature it just wrote three turns ago, you know what 33% looks like. The Stanford agent results earlier this month — best AI agents scoring half as well as PhDs on real work — weren't a coincidence. This is the same problem with a different label.

The next frontier isn't bigger reasoning. It's a training objective that optimizes for capability and reliability at the same time. That doesn't exist yet. Until it does, every "agentic" deployment is a coin flip wearing a suit.


Author: Yahor Kamarou (Mark) / www.humai.blog / 29 Apr 2026