There is a number sitting inside Stanford's 2025 AI Index Report that deserves far more attention than it has received. It is not about a new model capability, a benchmark record, or a fundraising round. It is about price.

Between November 2022 and October 2024, the cost to query an AI model that scores the equivalent of GPT-3.5 on MMLU dropped from $20.00 per million tokens to just $0.07 per million tokens, a more than 280-fold reduction in approximately 18 months.

Let that settle for a moment. A 280-fold cost reduction in a technology that two years ago was available only to companies with serious compute budgets and engineering depth. For context, this rate of price decline makes Moore's Law look sluggish. It is closer in magnitude to what happened with genome sequencing costs in the early 2010s, or with solar panel prices over the past decade. These are the kinds of collapses that do not just make existing products cheaper. They make entirely new categories of products possible.

The implications are not theoretical. They are happening right now, and they will reshape which businesses can compete in AI-driven markets, what software products look like in three years, and where the real value capture in the AI economy ends up sitting.


Where the 280x Number Actually Comes From

Before going further, it is worth being precise about what the Stanford figure is measuring. The benchmark in question is MMLU, a standardized test covering 57 subject areas across science, humanities, law, and professional domains. Stanford HAI defined GPT-3.5 level performance as a score of 64.8 on that benchmark and tracked which models reached that threshold, then compared their API pricing over time.

The model that achieved the $0.07 per million token price point was Google's Gemini-1.5-Flash-8B, a compact model that demonstrated how smaller architectures can maintain quality while dramatically reducing compute requirements. That is important context. The price did not fall because the same model got cheaper to run. It fell because a new, more efficient model was built that matched the old threshold at a fraction of the operational cost.

At the hardware level, costs have declined by 30% annually, while energy efficiency has improved by 40% each year. Those are structural, compounding trends. They do not reverse easily. The question is not whether AI will continue getting cheaper but how fast, and who benefits from that trajectory.


What Is Actually Driving the Cost Collapse

The 280-fold drop is the headline, but the mechanisms behind it matter for anyone trying to forecast where costs go from here.

Smaller, Smarter Models

The most significant driver is the emergence of capable small language models. In 2022, the smallest model registering a score higher than 60% on MMLU was PaLM, with 540 billion parameters. By 2024, Microsoft's Phi-3-mini, with just 3.8 billion parameters, achieved the same threshold. That represents a 142-fold reduction in model size in two years.

Running a 3.8 billion parameter model costs a tiny fraction of running a 540 billion parameter one. The intelligence per dollar calculation has been inverted. What changed is not just model size but training philosophy: Microsoft trained Phi-3 on carefully curated, textbook-quality synthetic data rather than massive unfiltered web crawls, proving that data quality can compensate for scale in many practical applications.

Specialized Inference Hardware

Amazon, Google, and others have introduced custom chips built specifically for inference, running models rather than training them. Groq, for example, claims 10x faster inference speeds and far lower energy consumption compared to standard GPUs. The distinction matters because training and inference have different computational profiles. Training requires dense matrix operations across enormous parameter counts. Inference is a more focused workload, and custom silicon built around that workload delivers substantially better economics than general-purpose GPUs.

Intensifying Market Competition

When OpenAI had no real competitors, there was little incentive to cut prices. That world no longer exists. The cost reduction is also driven by intensified market competition, with more API providers and more transparent pricing emerging across the ecosystem. Anthropic, Google, Meta, Mistral, and a growing roster of inference providers are competing for the same developer and enterprise wallet. That competitive pressure has a direct effect on what any of them can charge without losing customers.

The Mixture-of-Experts Architecture Shift

Models like DeepSeek V3.2 and Alibaba's Qwen3-235B use Mixture-of-Experts architectures that activate only a subset of parameters per token at inference time. Qwen3-235B has 235 billion total parameters but activates only 22 billion per token. The result is frontier-level output quality at a fraction of the compute cost of a dense model of equivalent capability. These architectural innovations are not proprietary secrets. They are documented in public research papers, and the efficiency gains they enable are now flowing through the entire model ecosystem.


The Real-World Math for Businesses

Abstract cost curves are interesting. Concrete numbers are more useful for anyone running a business.

A customer service chatbot handling 10,000 conversations per day would have cost approximately $2,000 per month in 2022. Today, the same capability costs around $7 per month. The AI capabilities that would have cost a startup $396,000 per year in 2022 now cost roughly $1,392 per year.

That is not a rounding error. It is a fundamental change in who can afford to build and deploy AI at scale. In 2022, an AI-powered product at any meaningful traffic level required either venture capital or a large corporate budget. In 2025, it fits inside a small business's software line item.

The convergence of affordability and capability means we are transitioning from a world where access to intelligence was scarce and gated to one where it becomes ambient and assumed. When the cost of computation fell, we got the personal computer. When bandwidth costs fell, we got YouTube and Zoom. The pattern is consistent across technology history: when infrastructure costs collapse, a wave of application-layer innovation follows that nobody fully anticipated at the start of the decline.


Who Is Responding — and How Fast

The adoption data already shows this shift in motion, and the pace is accelerating.

Stanford's AI Index reported that organizational AI adoption rose to 78% in 2024, up from 55% in 2023. That 23-point jump in a single year is one of the fastest adoption curves recorded for any enterprise technology category. For reference, enterprise cloud adoption took roughly a decade to move from early majority to near-ubiquity. AI is doing it in years.

The small business story is equally striking. A national survey found small business AI usage jumped from 39% in 2024 to 55% in 2025, a 41% increase. Among companies with 10 to 100 employees, usage jumped year-over-year from 47% to 68%.

The gap between large and small enterprises is closing faster than it has for any previous technology wave. In February 2024, large businesses used AI at 1.8 times the rate of small businesses. By August 2025, that gap had shrunk dramatically, with small business usage reaching 8.8% while large business adoption actually declined slightly to 10.5%. The small businesses that are growing are now nearly twice as likely to be investing in AI compared to those that are struggling, suggesting the productivity differential is already showing up in revenue outcomes.


The Sectors Feeling It First

Cost collapse does not affect all industries equally or simultaneously. A few sectors are already being reshaped at the infrastructure level.

  • Customer Service and Support. The economics of AI-powered support have flipped so dramatically that the question for most businesses is no longer whether to use AI for first-tier support but how quickly to do it. At $7 per month for 10,000 conversations, the cost comparison with human agents is not close.
  • Content and Marketing. Small businesses are leading adoption here. Marketing automation and content generation are the top use cases among SMBs integrating AI into daily operations, partly because the tasks are well-defined and partly because the cost-per-output comparison with human labor is immediately visible.
  • Software Development. Developer tools powered by AI coding assistants have become standard equipment at both large enterprises and early-stage startups. The cost of adding AI assistance to a development workflow has dropped to essentially nothing at current pricing, which is why adoption in this category is near-universal among technology companies.
  • Legal and Financial Services. Document analysis, contract review, and financial report summarization represent high-value, high-volume tasks where cheaper inference directly translates to margin improvement. Enterprise deployments in these categories are scaling quickly, though governance and compliance requirements are adding friction that does not exist in less regulated sectors.
  • Healthcare. Regulated environments move more carefully, but clinical documentation, coding, and administrative workflows are seeing genuine productivity gains from AI deployment. The cost reduction makes the ROI calculation compelling even for organizations with significant compliance overhead.

The Paradox: Cheaper Per Token, More Expensive in Total

The 280x headline is real, but there is a counterintuitive dynamic that any business deploying AI at scale needs to understand.

The frontier has moved. Users who were impressed by basic completion in 2022 now expect multi-step analysis with citations. What counts as acceptable AI performance has escalated continuously. So while efficient small models exist and inference hardware has improved significantly, actual deployments often use more capable and expensive models than they did previously because user expectations increased faster than efficiency gains.

Enterprise generative AI spending exploded from $11.5 billion in 2024 to $37 billion in 2025, a 3.2x increase, even as per-token costs dropped significantly. The average monthly AI budget rose 36% in 2025 to reach $85,521. The unit cost went down, but the volume and sophistication of what organizations are doing with AI went up faster.

This is not a reason to discount the cost collapse. It is a reason to be clear-eyed about what cheap inference actually buys. It buys access to yesterday's frontier capability at commodity prices. It does not automatically buy today's frontier capability at the same discount. Organizations that understand this distinction will make better infrastructure decisions than those who assume that cheap inference means cheap AI across the board.

There is also the question of reasoning models. Advanced models like OpenAI's o1 series or DeepSeek R1 use test-time compute, generating extended chains of reasoning before producing an answer. These models consume far more tokens per query than standard completion models. The o1 model is nearly six times more expensive and 30 times slower than GPT-4o. The cost of intelligence at the absolute frontier is actually higher per query than it was for GPT-4o, even as the cost of basic intelligence has collapsed.


Risks and Limits That Are Not Going Away

Lower cost removes financial barriers to AI adoption. It does not remove the other barriers, and it is worth being direct about what those are.

  • Safety and Reliability. Cheap models are not always safe models. Lower-cost API tiers and open-weight models vary considerably in their safety alignment. Businesses deploying AI in customer-facing contexts without adequate guardrails are taking on liability and reputational risk that cheaper inference does nothing to reduce.
  • Data Privacy and Compliance. Sending sensitive business data to any third-party API carries regulatory risk that varies by industry and jurisdiction. Lower pricing does not change the data governance obligations. In regulated industries, the total cost of compliant AI deployment includes legal review, data handling infrastructure, and audit trails that are not reflected in per-token API pricing.
  • Quality Variance. GPT-3.5 level performance, which is what the 280-fold reduction buys at $0.07 per million tokens, is adequate for many tasks and inadequate for others. Businesses that deploy cheap inference without testing it against their actual use cases will discover this at the worst possible time.
  • The Harmful Incident Risk. The AI Incident Database recorded 233 harmful or dangerous incidents in 2024, surpassing roughly 150 in 2023 and 100 in 2022. Incidents included false identification in anti-theft AI, deepfake content, and chatbots encouraging self-harm. Faster adoption driven by lower costs has a direct relationship with the rate of harmful incidents. Organizations scaling quickly should invest in monitoring and testing infrastructure at a pace that matches their deployment velocity.

What This Means for Founders and Investors

For startup founders, the cost collapse changes the unit economics of AI-native products in ways that were not possible two years ago. Products that would have been margin-negative at 2022 inference costs can now be profitable at meaningful scale. This is allowing a new generation of AI-native SaaS companies to emerge with significantly lower capital requirements than the previous cohort.

For investors, the deflationary wave is reshaping the AI ecosystem, favoring widespread deployment over elite performance and enabling AI-native applications to flourish in the margins. The next big returns may come not from AI builders, but from those embedding cheap, ubiquitous AI into everyday products. The infrastructure layer — foundation model APIs — is becoming commoditized. The application layer, where AI capability is embedded into specific workflows for specific industries, is where the new margin is forming.

The analogy that keeps surfacing in investment circles is cloud computing. AWS made compute cheap and reliable. The outsized returns did not go to AWS customers who used it the same way everyone else did. They went to companies like Airbnb, Uber, and Shopify, which built fundamentally different products because cheap, scalable compute made them viable. Cheap, scalable inference is the current equivalent of that moment.


What Comes Next

OpenAI's CEO Sam Altman has put it plainly: the cost of intelligence will converge to the cost of energy. Major AI providers are acting accordingly, with Microsoft contracting the entire output of a revived nuclear reactor and Amazon investing over $52 billion in nuclear projects across multiple states. The long-run cost floor for inference is an energy cost, and the race to source that energy more cheaply is well underway.

The trajectory of the last eighteen months suggests the 280x reduction seen between 2022 and 2024 is not an anomaly. Architectural innovations continue to compound. Hardware keeps improving. The competitive pressure among providers is not abating. The reasonable expectation is that another significant reduction in the cost of capable inference occurs over the next two years, extending access further down-market and enabling entirely new categories of AI-powered products that nobody is building yet because the economics do not quite work today.

The businesses that will feel this most are the ones that have not yet started. The cost barrier that kept AI experimentation out of reach for smaller organizations has largely been removed. The competitive advantage that early movers have accumulated is real, and it compounds with time.

The 280x reduction is not a press release stat. It is the underlying condition that explains almost everything else happening in the AI industry right now: the explosion in small business adoption, the proliferation of AI-native startups, the pressure on large software companies to embed AI into existing products, and the scramble by every enterprise category to figure out what an AI-native version of their workflow looks like. All of it flows from the same source: intelligence got cheap, and cheap technology always finds its way into everything.


Frequently Asked Questions

What does the Stanford 280x AI cost reduction actually mean?

Stanford's 2025 AI Index Report found that the cost of querying an AI model performing at GPT-3.5 quality, defined as a 64.8 score on the MMLU benchmark, dropped from $20 per million tokens in November 2022 to $0.07 per million tokens by October 2024. That is a 280-fold reduction in roughly 18 months, driven by more efficient small models, specialized inference hardware, and increased competition among AI providers.

Why did AI inference costs fall so dramatically in such a short time?

Three main forces converged simultaneously. First, smaller and more efficient models emerged, with Microsoft's Phi-3-mini achieving GPT-3.5 performance with 3.8 billion parameters compared to the 540 billion parameters that were previously required. Second, specialized inference chips from providers like Google and custom silicon companies dramatically improved speed and energy efficiency. Third, intense competition among API providers pushed prices down across the board. Hardware costs have fallen 30% annually and energy efficiency has improved 40% per year, making each generation of infrastructure substantially cheaper to operate.

Does cheaper AI mean businesses can now do everything with AI for almost nothing?

Mostly, but with an important caveat. The 280x reduction applies to GPT-3.5-level performance, which is adequate for many everyday tasks like customer service, content drafting, and document summarization. However, frontier reasoning models that handle complex multi-step analysis are substantially more expensive. User and business expectations have also risen alongside model capabilities, so organizations often end up deploying more capable, and more expensive, models than the headline number implies. Total enterprise AI spending has actually increased even as per-token costs fell.

How are small businesses benefiting from lower AI costs?

Significantly and measurably. U.S. small business AI adoption jumped from 39% in 2024 to 55% in 2025. Among firms with 10 to 100 employees, adoption rose from 47% to 68% in a single year. Tasks like marketing content generation, customer service automation, and appointment scheduling, which would have cost thousands of dollars per month to automate in 2022, now cost tens of dollars. The gap between large enterprise and small business AI adoption is closing faster than it has for any previous major technology category.

What are the risks of deploying cheap AI inference in a business context?

Several. Lower-cost models vary considerably in their safety alignment, which creates risk in customer-facing deployments without proper guardrails. Sending business data to third-party APIs carries regulatory and compliance obligations that do not change with price. The AI Incident Database recorded 233 harmful AI incidents in 2024, up from 150 in 2023, partly reflecting faster and less careful deployment at scale. Businesses scaling AI quickly should invest in testing, monitoring, and governance infrastructure at a pace that matches their deployment velocity.

How should investors think about the AI cost collapse?

The deflationary pressure on inference costs commoditizes the foundation model API layer over time. The investment opportunity is shifting toward application-layer companies that embed cheap AI capability into specific industry workflows, and toward infrastructure providers that enable efficient deployment and management of AI at scale. The parallel is cloud computing: AWS made compute cheap, but the outsized returns went to companies that built fundamentally new products on top of that cheap infrastructure, not to AWS customers who used it conventionally.

Will AI inference costs keep falling after 2025?

The structural conditions that drove the 280x reduction are still present and still improving. Model architectures continue to become more efficient, hardware keeps improving, and the competitive market for inference services is not contracting. The long-run cost floor is primarily an energy cost, and major providers are racing to secure cheaper energy through nuclear power and renewable sourcing agreements. Another significant reduction in the cost of capable AI inference over the next two to three years is a reasonable expectation, though the rate will vary by capability tier.

What is the difference between inference cost and training cost, and why does it matter?

Training cost is the one-time expense of building an AI model from scratch, which has reached hundreds of millions of dollars for frontier models. Inference cost is the ongoing per-query expense of running a trained model to generate responses. The 280x reduction refers entirely to inference cost. Training costs have actually increased significantly over the same period, with frontier model training now requiring billions of dollars. This means building cutting-edge foundation models remains a capital-intensive endeavor restricted to well-funded labs, while using those models or their efficient smaller derivatives is now accessible to essentially any business.


The True Cost of AI: Energy Use, Climate Impact, and Deepfake Risks
AI’s true cost: energy use, climate impact, and deepfake risks - why sustainable, ethical AI is essential for the future.
ChatGPT Ads Cost $60 CPM: Here’s What Advertisers Actually Get for That Price
OpenAI wants $60 CPM for ChatGPT ads – 3 times what Meta charges, with none of the conversion tracking. Here’s why major brands are lining up anyway, and what it means for the future of AI.