Google's Gemma 4 Runs on Your Phone and Beats Most Paid AI Models

Google just released Gemma 4 — a free, open-source model that jumped from 21% to 89% on math benchmarks and runs locally on Android. Here's why this changes the case for paid AI subscriptions.

Google quietly dropped Gemma 4 today, and the benchmarks are the kind of numbers that make you do a double-take. On the AIME 2026 math benchmark, the new 31B model scored 89.2%. The previous Gemma 3 version scored 20.8%. That is not an improvement — that is a different category of model entirely.

Gemma 4 comes in four sizes: an Effective 2B and 4B for edge and mobile deployment, a 26B Mixture-of-Experts, and a 31B Dense model for serious workloads. All four are open-source under Apache 2.0, meaning you can download, modify, and deploy them commercially with zero licensing fees. No subscription. No usage caps. No terms of service renegotiation every six months.

Every model in the family supports multimodal input out of the box: text, images, video up to 60 seconds, and audio. Context windows run up to 256,000 tokens on the larger models. The 31B currently ranks #3 among all open models globally on the Arena AI text leaderboard. The 26B ranks #6. Google also launched the Gemma 4 AICore Developer Preview for Android today — the smaller models now run natively on-device, no internet required.

On LiveCodeBench, a coding benchmark, Gemma 4 jumped from 29.1% to 80.0%. On GPQA science reasoning: 42.4% to 84.3%. These are not marginal gains. These are the benchmarks of a model that crossed a qualitative threshold — from "useful assistant" to "credible professional tool."

Gemma 4 is built directly on the same architectural foundation as Gemini 3. Google is essentially giving away Gemini-3-era intelligence for free. For developers building apps, startups bootstrapping on tight budgets, and researchers who cannot afford $200/month API bills, this changes the math completely. The 4B model runs on a mid-range Android phone. The 31B competes at the frontier.

Google's strategy is clear and it is working: flood the market with capable open models, keep developers building on Google infrastructure, let ecosystem advertising and cloud revenue cover the cost. It is a subsidized intelligence play, and it is putting real pressure on closed-model providers.

My Opinion

Here is the uncomfortable truth: premium AI subscriptions are becoming genuinely hard to justify for most people. GPT-4-class performance is now free, open-source, and runs offline. The gap between what you pay for and what you can get for nothing keeps narrowing at a pace that the premium providers cannot ignore.

What bugs me is the ongoing mismatch between valuations and sustainable moats. OpenAI just crossed $25 billion in annualized revenue. Anthropic is raising at figures that assume closed models stay premium forever. Meanwhile Google keeps shipping Gemma, Meta keeps shipping Llama, and Mistral keeps releasing open alternatives every few months. At some point, the only defensible advantage for closed models is raw frontier capability — and even that edge is measured in months, not years.

I will be blunt: if you are paying for an AI subscription to summarize documents, write emails, or help with code, Gemma 4 running locally just became your free alternative. The real frontier has moved. It is not "can it do this task?" anymore. It is "can it do this task while being orchestrated into an autonomous agent pipeline handling 20 things simultaneously?" Gemma 4 is now a credible player even in that conversation. Closed model providers had better have an answer beyond "trust us, ours is better."

Author: Yahor Kamarou (Mark) / www.humai.blog / 06 Apr 2026

Mark

AI Strategy & Transhumanism Researcher Exploring the intersection of human evolution, AI consciousness, and productivity optimization. Author of 100+ guides on AI tools, workflow automation, and the future of human enhancement at HumAI.blog.