A 17-year-old bug was sitting inside FreeBSD's network file system implementation. It had survived years of code review, fuzzing campaigns, and manual security audits. An unauthenticated attacker anywhere on the internet could use it to gain complete root access to any affected server.

Nobody knew it was there. Then Claude found it, built a working exploit, and flagged it for disclosure, all without human intervention after the initial prompt.

That was CVE-2026-4747, one of thousands of vulnerabilities that AI models have discovered across production software in the past few months. The bug is patched now. The question it raises is not.


The Scale of What AI Is Finding

The FreeBSD vulnerability is the headline case, but it is not an outlier. It is part of a pattern that has been building since late 2025 and reached a public inflection point in early April 2026.

Claude Opus 4.6 found more than 500 high-severity zero-day vulnerabilities in open-source software during a research initiative called MAD Bugs, which ran through April 2026. The model was given no custom scaffolding, no specialized prompting, and no task-specific tooling. Researchers placed it inside a virtual machine with a target codebase and standard utilities and let it work.
Firefox. Claude Opus 4.6 found 22 security vulnerabilities in Firefox in two weeks, including 14 classified as high severity. That figure exceeds the number of vulnerabilities reported in any single month throughout 2025. Mozilla engineers who reviewed the findings noted that some matched what traditional fuzzing had found, but others were entirely new classes of logic errors that fuzzers had never caught. Claude also wrote working exploits for two of the bugs. All findings were validated before disclosure and patched in Firefox 148.
OpenSSL. AI security startup AISLE discovered all 12 zero-day vulnerabilities announced in OpenSSL's January 2026 security patch, including a rare high-severity stack buffer overflow in CMS message parsing. Their system accounted for 13 of the 14 total OpenSSL CVEs assigned in 2025. Three of those bugs dated back to 1998 to 2000, having been undetected for 25 to 27 years. One predated OpenSSL itself and was inherited from SSLeay, the original SSL implementation from the 1990s.
Claude Mythos Preview. The most advanced case is Anthropic's unreleased Claude Mythos Preview, which found thousands of vulnerabilities in every major operating system and every major web browser as part of the Project Glasswing initiative. On the Firefox 147 benchmark, Mythos developed 181 working exploits compared to just 2 for Claude Opus 4.6. Mythos saturates Anthropic's entire internal cybersecurity evaluation suite. In one case, it chained four separate vulnerabilities together to escape both the renderer and OS sandboxes in a single browser exploit.

Why AI Finds What Humans and Fuzzers Miss

Traditional automated security tools work by throwing massive amounts of random inputs at code to see what breaks. Fuzzing is computationally intense, generates an enormous volume of tests, and is genuinely effective at finding certain classes of bugs, particularly memory corruption in code that processes untrusted input.

Fuzzing does not read code. It does not reason about what the code is supposed to do, what the author might have assumed about its inputs, or how a subtle logical error made 17 years ago might interact with a refactoring that happened three years later.

Claude reads and reasons about code the way a human security researcher would. Anthropic's red team describes the methodology: the model looks at past fixes to find similar bugs that were not addressed, spots patterns that tend to cause problems, and understands logic well enough to know exactly what input would break it. It runs the software, uses debuggers, forms hypotheses, and iterates.

This approach reaches bugs that fuzzers cannot. The FFmpeg case is instructive: a 16-year-old vulnerability introduced in a 2003 commit and exposed by a 2010 refactoring had been examined by every fuzzer and human reviewer who had looked at the code since then. Millions of automated test runs had hit the relevant line of code without catching it. Claude found it by reading the code and understanding what was broken about the logic.

Anthropic's own characterization of this capability is striking: "We did not explicitly train Mythos Preview to have these capabilities. Rather, they emerged as a downstream consequence of general improvements in code, reasoning, and autonomy." The cybersecurity capability is not a separate feature built deliberately into the model. It is what happens when a model becomes generally better at understanding code, and every improvement in general coding ability also improves the model's ability to find and exploit security flaws.


The Disclosure Problem

Finding bugs faster than they can be patched is not a solved problem.

The standard 90-day disclosure window was designed around the assumption that finding vulnerabilities is hard, slow, and requires specialized expertise. When Claude can find 500 high-severity bugs in a single research sprint, and Mythos can produce working exploits overnight from a simple prompt, that assumption no longer holds.

When Anthropic pointed Opus 4.6 at open-source software, the model found and validated more than 500 high-severity vulnerabilities and began the coordinated disclosure process. Many open-source projects are maintained by small teams or individual volunteers with no dedicated security resources, and the volume of incoming disclosures from AI-assisted research is already straining maintainer bandwidth.

The curl bug bounty program was a concrete casualty of this shift. Daniel Stenberg, curl's creator and primary maintainer, closed the program in January 2026 after AI-generated garbage submissions flooded it. About 20% of 2025 submissions were AI-produced low-quality noise, and only 5% of all submissions represented genuine vulnerabilities. The program that had run since 2019 and paid out over $90,000 for 81 real vulnerabilities was killed by the volume.

The same period produced a stark bifurcation: AISLE's AI system found all 12 CVEs in the January 2026 OpenSSL patch while the curl bug bounty collapsed under low-quality AI submissions. Mass adoption raises the median noise level while simultaneously raising the ceiling for what high-quality tools can find.

Mozilla engineers who reviewed Claude's Firefox findings compared the moment to the early days of fuzzing, suggesting that there is likely a substantial backlog of now-discoverable bugs across widely deployed software that decades of human review and automated testing have not found.


The Offense/Defense Asymmetry

The capability that makes AI useful for defenders makes it equally useful for attackers, and that asymmetry matters.

A researcher at Anthropic with no formal security training asked Mythos to find remote code execution vulnerabilities overnight and woke up the next morning to a complete, working exploit. The barrier to effective offensive security research has not just been lowered; it has been reduced to a prompt.

Trend Micro's AI security team found in January 2026 that threat actors are already using AI agents to automate vulnerability discovery. Palo Alto Networks' 2026 analysis identifies automated vulnerability scanning and exploit chaining as a top-tier concern, second only to hyper-personalized phishing.

XBOW, a commercial AI security agent, submitted more than 1,000 bug reports to security programs in a 90-day window, including 132 classified as critical. The working assumption for most security teams has been that finding exploitable bugs requires expertise that limits the pool of potential attackers. That assumption is no longer reliable.


What This Means in Practice

The practical implications break down into several concrete questions for anyone maintaining or depending on software.

Patch cycles need to be faster

The window between discovery and active exploitation has historically been measured in weeks. When attackers have access to the same AI capabilities that researchers do, that window will compress. Treating CVE-tagged dependency updates as urgent is no longer a best practice. It is a baseline requirement.

AI should be running against your own code

If Anthropic's researchers can find 500 high-severity vulnerabilities by pointing a model at open-source code with no special configuration, the same approach works on private codebases. The question is whether your security team runs it first or someone else does.

Help Net Security's analysis of the Mythos findings states directly: organizations that have not yet integrated language models into vulnerability management should start with currently available frontier models, which are already capable of finding high- and critical-severity bugs across web applications, cryptography libraries, and production kernels.

The bug backlog is larger than anyone knew

Mozilla's engineers described the Claude Firefox findings as analogous to the early days of fuzzing, a moment when a new class of tool revealed that code everyone considered well-audited was full of undiscovered problems. The OpenSSL bugs dating back to 1998 were not found by fuzzing or human review despite decades of intensive scrutiny of one of the most security-critical codebases on the internet.

The reasonable interpretation is that most production software, including software you depend on today, contains bugs of similar age and severity that have not yet been found. The AI tools capable of finding them are available now.

Open-source maintainers face a volume problem

The curl situation illustrates a structural challenge: the same AI capabilities that enable high-quality vulnerability discovery also enable low-quality AI-generated noise at scale. Maintainers of widely-used open-source projects need noise filtering mechanisms that did not exist when bug reports were submitted one at a time by human researchers.


The Broader Picture

Anthropic published a 244-page system card for Mythos and withheld the model from public release specifically because of the cybersecurity capabilities this article describes. That decision, unusual in an industry defined by shipping, reflects an assessment that the defensive window, the period in which AI can be used to find and patch bugs before attackers develop similar capabilities, is real but finite.

Logan Graham, who leads offensive cyber research at Anthropic, estimates it will take six to eighteen months before similar capabilities proliferate beyond organizations committed to deploying them responsibly. Project Glasswing is designed to use that window to patch as much critical infrastructure as possible before it closes.

Whether or not that framing is precisely correct, the underlying dynamic is not in dispute. AI models are already finding vulnerabilities at a scale and depth that was not possible six months ago. The question for security teams is not whether to engage with this capability shift, but how quickly they can get on the offensive side of it rather than wait to respond to its consequences.

The 17-year-old FreeBSD bug is patched. There are more like it.


Frequently Asked Questions

How is AI finding vulnerabilities that humans and automated tools missed for decades?

Traditional fuzzing tools find bugs by generating massive volumes of random inputs. They cannot read code or reason about what the code is meant to do. Claude reads and reasons about code the way a human security researcher would: looking at past fixes to find similar unaddressed bugs, spotting patterns that cause problems, and understanding logic well enough to identify what input would break it. This approach finds classes of bugs, particularly logical errors and subtle memory issues, that fuzzing cannot reliably detect.

What vulnerabilities has AI found in production software?

Claude found 22 Firefox vulnerabilities in two weeks including 14 high-severity bugs, and more than 500 high-severity zero-days during the April 2026 MAD Bugs initiative. AISLE's AI found all 12 CVEs in the January 2026 OpenSSL security patch, including bugs undetected since 1998. Claude Mythos Preview found thousands of vulnerabilities in every major operating system and web browser, including a 17-year-old FreeBSD remote code execution flaw and a 27-year-old OpenBSD vulnerability.

Is AI being used for offensive attacks, or just defense?

Both. Anthropic and research organizations use AI for responsible disclosure. But threat actors are already using similar capabilities offensively. Trend Micro reported in 2026 that attackers are automating vulnerability discovery with AI agents. XBOW, a commercial AI security tool, submitted more than 1,000 bug reports in a 90-day window including 132 classified as critical.

What should security teams do in response?

Start integrating frontier AI models into vulnerability management workflows now. Use currently available models against your own infrastructure before attackers do. Shorten patch cycles and treat CVE-tagged dependency updates as urgent. Invest in automated incident response pipelines, since higher rates of vulnerability discovery will produce more exploitation attempts in the window before patches ship.

Why did curl's bug bounty program close?

Curl's bug bounty closed in January 2026 after AI-generated low-quality submissions flooded it. About 20% of 2025 submissions were AI noise, and only 5% of all submissions were real vulnerabilities. The overhead became unsustainable for the small maintenance team. It illustrates a bifurcation: high-quality AI tools find real bugs, while low-quality AI tools generate noise that burdens maintainers.

Does this mean all the software I use is insecure?

Most production software probably contains undiscovered vulnerabilities similar to what AI has already found. That has always been true. What changed is that AI can now find them faster than the previous state of the art allowed. The sensible response is accelerated patching, AI-assisted security scanning on your own systems, and shorter windows between vulnerability discovery and deployment of fixes.


Anthropic Paid $400M for a 9-Person Startup
Anthropic acquired Coefficient Bio — a 9-person biotech AI startup founded just 6 months ago — for $400 million. That’s $44 million per employee. Here’s why this deal reveals something uncomfortable about where AI investment is heading.
Anthropic Cuts Claude Subscriptions for Third-Party Tools
Starting today, Anthropic’s Claude Pro and Max subscribers can no longer use their subscription limits with third-party tools like OpenClaw. The company calls it a capacity issue. It’s really a pricing one.
Anthropic Leaked Claude Code Source: 513K Lines Exposed
A debugging file bundled in a routine npm update exposed 513,000 lines of TypeScript — the complete agentic harness powering Claude Code. Within hours it was forked tens of thousands of times. Anthropic called it human error. Competitors called it a gift.