Cyber Superstorms

How Powerful AI Vulnerability Discovery Capabilities Could Make Cyber Crises Routine

May 28, 2026

In this post, we present “Cyber Superstorms” as a useful lens for thinking about the consequences of AI-accelerated vulnerability discovery. Superstorms are acute events that place tremendous strain on defenders. The key question is not how many zero-days AI can find, but how often AI-accelerated discovery produces crises that overwhelm the defensive ecosystem.

After the announcement of Anthropic’s Mythos Preview and Project Glasswing, much of the initial public discussion focused on the sheer number of vulnerabilities Mythos allegedly discovered. Anthropic reported that the model had found thousands of “high-severity vulnerabilities” across every major operating system and web browser, with over 99% still unpatched at the time of announcement. Mozilla reported that Mythos identified 271 vulnerabilities in a single Firefox release, contributing to 423 security fixes shipped in April 2026 — more than across the previous 14 months combined.

While thousands of zero-days across every major operating system and browser has a powerful memetic quality, I think it’s not a very helpful framing for thinking about how AI-enabled vulnerability discovery capabilities translate into actual cyber risk. First, not all vulnerabilities are equal or ever exploited. According to CISA”s Known Exploited Vulnerabilities catalog, only 0.5% of the 48,000 publicly disclosed vulnerabilities were actively exploited in the wild. What matters is the combination of how severe a vulnerability is, how easily it can be weaponized, and how widely the affected software is deployed.

Second, more vulnerabilities do not linearly translate into more risk, because this framing abstracts away from the contexts on both sides of the offense-defense equation. As Joshua Saxe argued, technologies do not cause attackers to do cyberattacks, and operationalizing novel exploits requires significantly more skill and infrastructure than the discovery step alone.1 Attackers pursue their desired outcomes through the easiest available means, given their own social, economic, and political contexts.

That said, vulnerability supply matters: when severe zero-days reach capable actors, risk can jump sharply. The Shadow Brokers leak of NSA-developed exploits drove both WannaCry and NotPetya — two of the superstorm-class events discussed below — and is a natural experiment in what happens when a trove of weaponized zero-days suddenly becomes accessible.

On the defender side, counting vulnerabilities tells you little about the capacity to respond: how much friction exists in the remediation pipeline, how much incident response resourcing is available at any given moment, and what happens when patching simply cannot keep pace.

So if the question “how many zero-days can Mythos find?” is the wrong starting point, what could be a better one?

Thinking in superstorms

There’s a more useful way to contextualize the effects of accelerated vulnerability discovery. Rather than counting vulnerabilities, we can look at how often the cybersecurity ecosystem may face vulnerabilities severe enough to create systemic risk and overwhelm its capacity to respond. We can think about these events as cyber superstorms.2

In meteorology, superstorms are rare, unusually destructive weather events. This distinction is meaningful because routine storms are manageable, but superstorms are more likely to overwhelm emergency response capacity.

The history of cybersecurity in the past decade includes several events that we might call ‘cyber superstorms,’ such as Log4Shell in 2021, WannaCry in 2017, and the MOVEit mass exploitation of 2023.

These superstorms differ from ordinary cyber incidents in that they impose immense remediation burdens on defenders. When Log4Shell hit, the Cyber Safety Review Board characterized the response as requiring “thousands of security professionals across the globe” to mobilize simultaneously, and many spent weeks to months on remediation. In one case, a federal department reported that it delayed mission-critical work for weeks. Also, the ISC² found that 27% of organizations reported being less secure in other areas during Log4Shell remediation because the remediation efforts diverted capacity they would normally devote to other defenses. And despite these efforts, 13% of Log4j downloads in 2025 were still pulling vulnerable versions, four years after disclosure.

A superstorm framing is helpful because it connects directly to defensive resourcing. For a CISO, a government cybersecurity agency, or a policymaker deciding how much to invest in incident response capacity, what matters is how the growing volume of vulnerabilities translates into risk in practice. How often will defenders face events that redline their response capacity? It should be noted that this relationship between cyber superstorms frequency and defensive resourcing costs is plausibly nonlinear. Two superstorms per year is not necessarily only twice as bad as one. It is potentially worse because the second may hit while teams are still depleted from the first.

The baserate: How often have cyber superstorms actually happened?

We sketch out a back-of-the-napkin estimate to illustrate some of these intuitions. While we can quibble with the precise figures, this kind of exercise still seems useful because rough order-of-magnitude figures can reveal whether current defensive commitments are in the right ballpark, and because putting numbers on the argument makes our assumptions explicit.

So how often do cyber superstorms happen? We define a cyber superstorm as a vulnerability-centered event that forces industry-wide or cross-sector crisis-mode defensive mobilization for weeks or longer. Some such events are driven by observed mass exploitation; others are driven by the credible and imminent threat of mass exploitation. From the defender’s perspective, both can impose superstorm-level burdens. We identified candidates using the following criteria:

The affected software3 is deployed broadly, or sits at a chokepoint in the software supply chain where many downstream products inherit the same vulnerability.
Attackers can use the associated vulnerability without needing valid credentials, special access, or unusual skill.
Successful exploitation gives the attacker substantial control, e.g., running commands on a target system, bypassing login requirements, or extracting large volumes of sensitive data
The response required sustained, cross-organizational mobilization over weeks or longer.4

Looking at the past twelve years, candidates include:

Heartbleed (2014): A bug in OpenSSL, the encryption software used by roughly two-thirds of all websites at the time, allowed attackers to steal passwords, encryption keys, and other sensitive data from server memory.
Shellshock (2014): A bug in Bash, a fundamental command-line tool built into nearly every Linux, Unix, and Mac server, allowed attackers to remotely execute commands on vulnerable web servers.
EternalBlue / WannaCry / NotPetya (2017): A flaw in Windows file-sharing that powered the WannaCry ransomware worm, which encrypted over 200,000 computers across 150 countries within days. The same vulnerability was reused in the NotPetya attack, which caused an estimated $10 billion in damages worldwide.
SIGRed (2020): A 17-year-old flaw in Windows DNS Server, which runs on virtually every Windows domain controller. CISA issued an emergency directive with an unusually short 24-hour patching deadline.
ZeroLogon (2020): A cryptographic flaw that let an unauthenticated attacker take over the core of an organization’s network identity system. CISA issued an emergency directive.
ProxyLogon (2021): A set of flaws in Microsoft Exchange, the email server used by hundreds of thousands of organizations worldwide, that allowed attackers to break in remotely without any credentials.
Pulse Connect Secure (2021): A pre-authentication flaw in widely deployed VPN appliances, exploited by suspected Chinese state-sponsored actors against defense, government, and financial targets. CISA issued an emergency directive.
PrintNightmare (2021): A flaw in Windows Print Spooler, a service running by default on most Windows systems. The initial patch was incomplete, forcing multiple rounds of emergency fixes. CISA issued an emergency directive.
ProxyShell (2021): A second pre-authentication attack chain in Microsoft Exchange, disclosed five months after ProxyLogon.
Log4Shell (2021): A bug in Log4j, a logging tool embedded in an estimated 93% of enterprise cloud environments. Attackers could take full control of affected systems using a single specially crafted text string. CISA issued an emergency directive.
MOVEit (2023): A bug in MOVEit Transfer, a file transfer tool widely used in government, healthcare, and finance, was exploited by the Cl0p criminal group before the vulnerability was publicly disclosed.
Ivanti Connect Secure (2024): Chained flaws in Ivanti VPN gateways. CISA issued an emergency directive.
ToolShell / SharePoint (2025): A pre-authentication attack chain in on-premise Microsoft SharePoint, exploited by multiple Chinese state-sponsored groups and ransomware actors.
Cisco ASA / FTD (2025): Chained flaws in widely deployed Cisco firewalls, exploited by state-sponsored actors. CISA issued an emergency directive requiring hard resets and forensic reporting.
React4Shell (2025): A flaw in React Server Components and Next.js, frameworks underlying tens of thousands of websites. A single HTTP request could give attackers full control of a vulnerable server.

This gives us 15 cyber superstorm-class events over 12 years (from 2014-2025), or ~1.25 events per year.

For per-event defensive resourcing costs, Log4Shell is the best-documented case and serves as a useful anchor. Arctic Wolf reported average incident response costs exceeding $90,000 per engagement, with 25% of its customer base targeted. ISC² found that 52% of security teams spent weeks to over a month on remediation. One federal cabinet department alone dedicated 33,000 hours to Log4j response, suggesting federal remediation costs in the tens of millions, with the far larger private sector multiplying that considerably. Industry leaders characterized total mitigation costs as in the billions. Taken together, a rough estimate of $1-10B in direct remediation and response costs seems plausible, with a wider range if you include downstream breach costs and the long-tail remediation that the CSRB said would take a decade.

How AI-enabled vulnerability discovery could change the frequency?

AI-augmented vulnerability discovery could increase superstorm frequency by:

Accelerating discovery in the existing vulnerability stock. If there is a large pool of undiscovered critical bugs in widely deployed software and AI scans that pool faster and more thoroughly than human researchers.5 Mythos finding a 27-year-old bug in OpenBSD, one of the most security-hardened systems in existence, suggests that pool may be larger than previously assumed. Also, Cloudflare ran Mythos against more than 50 of its own internal repositories and surfaced novel multi-stage exploit chains in code that had already been heavily audited.
Accelerating and expanding exploit development. The faster a discovered vulnerability reaches mass exploitation, the more likely it is to reach superstorm status before patching can intervene. AI can compose chains of moderate-severity vulnerabilities into high-impact exploits. Mythos demonstrated the ability to chain four separate vulnerabilities into a single webpage-to-kernel exploit. This expands the fraction of discovered vulnerabilities that can reach superstorm status. A moderate-severity bug might not qualify on its own, but when chained with other moderate bugs, it could lead to a superstorm-class event.

How much does this shift the baserate? The historical average is roughly 1.25 superstorms per year, though the average masks a trend: the first half of the period (2014–2019) saw roughly 0.5 per year, while the second half (2020–2025) saw roughly 2 per year. This suggests the baseline is already rising before AI-accelerated discovery enters the picture.6

How much this changes further depends on a few key variables, e.g., how effectively attackers operationalize the increased discovery. Also, how effectively do defensive programs like Glasswing or OpenAI’s Trusted Access for Cyber offset the increased risks?7 Another variable is how large the undiscovered vulnerability stock actually is (and whether AI sustains an elevated discovery rate or whether this tapers off)

We illusrate some of the potential effects using three uplift scenarios8:

In a conservative scenario, AI vulnerability discovery capabilities prove useful but not transformative. Defensive programs like Glasswing and OpenAI’s Trusted Access for Cyber offset some of the increased discovery rate. Attacker adoption is slowed by organizational and infrastructure frictions. Cyber superstorm frequency rises modestly (1.5-2x), to 2-3 per year.
In a moderate scenario, AI-augmented discovery surfaces enough superstorm-class vulnerabilities to regularly outpace the patching pipeline. Exploit chaining capabilities mean that moderate-severity bugs in widely deployed software increasingly combine into superstorm-class attack chains. A wider range of attacker groups benefit from cheaper exploit development. Defensive programs help but cannot keep pace. Frequency rises to 4-6 superstorms per year (3-5x).
In a severe scenario, Mythos-class capabilities (and above) proliferate to open-source models within months. Attackers invest heavily in leveraging these capabilities, and the exploit chaining effect dramatically expands the pool of vulnerability combinations that can reach superstorm status. Here, let’s say we now face 8-12 superstorms per year (6-10x).

Implications for defenders

Even in the conservative uplift scenario, using Log4Shell’s estimated $1B-10B in direct costs as a rough per-event anchor, the shift would be significant. The current baseline of roughly one superstorm every year implies an annualized cost of ~$1B-12B.

Under the conservative scenario of 2-3 per year, this rises to $2B-30B dollars
Under the moderate scenario of 4-6 per year, that rises to $4-60B.
Under the significant scenario of 3-5 per year, it could reach $8-120B

Even a moderate scenario implies direct costs that represent a meaningful share of the ~$244B in global cybersecurity spending projected for 2026. And these estimates likely understate the costs of remediation and the burden placed on defenders because they assume the effects of the superstorm are independent of one another. In reality, it will matter a lot to defenders if they are hit by the third superstorm of the year after already being strained by the first two.

Compare these figures to what has actually been pledged. Both Anthropic and OpenAI have made significant commitments to defensive vulnerability discovery, including $100M in usage credits from Anthropic and $10M in API credits plus expanded model access from OpenAI. These are valuable and important. But they are focused on the discovery side of the pipeline, which is already the part AI accelerates most effectively. The superstorm costs estimated above fall overwhelmingly on the remediation-end. As one analysis of Glasswing put it: the finding problem has been solved, but nobody has solved the fixing problem.

For policymakers, CISOs, and cybersecurity agencies, the practical takeaway is straightforward: plan for more frequent cyber superstorms. The historical baserate of roughly one superstorm every two years has been manageable, if painful. A world with two to five per year is likely not. Preparing for that world requires investment in accelerating remediation, including AI-augmented defensive tools that focus on the fixing problem, not just the finding problem.

That gap is narrowing, however. AI vulnerability research is starting to make inroads on the weaponization side as well, see the recent Google Threat Intelligence Group disclosure around the first documented case of a threat actor using a AI-developed zero-day ‘in the wild’.

This is, of course, only one lens on the potential consequences of Mythos-class capabilities. A cumulative damages approach, as in GovAI’s recent work on global cybercrime cost baselines, captures the full distribution of harm rather than just the catastrophic tail. An actor and bottleneck analysis, as Saxe advocates, asks which specific attacker constituencies will be unblocked and how much organizational friction will slow adoption. We focus on the superstorm framing here because it captures something these other approaches tend to miss: the concentrated mobilization problem, where the entire defensive ecosystem is in crisis mode simultaneously, and compounding failures make each additional event disproportionately costly.

We focus specifically on software vulnerabilities because that is what Mythos-class capabilities target. This excludes hardware architecture-related exploits (e.g., Spectre/Meltdown) and supply chain compromises (e.g., SolarWinds), both of which triggered superstorm-level mobilization but through mechanisms outside our scope of AI-accelerated vulnerability discovery.

One signal of this is whether CISA issued an Emergency Directive ordering federal agencies to patch within days, though some of the events we list precede the establishment of this practice.

For the scenarios, we treat this stock as effectively deep, but it’s possible AI is clearing accumulated low-hanging fruit that built up over years. If so, discovery rates could settle back to a baseline tied to the amount of new code being written. The shape of the curve also matters. Discrete boom phases of vulnerability discovery, with each new model generation, would produce a different cyber storm trajectory than smooth, continuous growth in vulnerability discovery.

Some of this pre-AI rise likely reflects growing total code volume across every sector and maturing discovery tools (such as fuzzing), independent of any AI effect.

We don’t try to net out AI’s effect on producing secure (or insecure) code. Developers may increasingly use AI to harden code pre-deployment. Alternatively, AI-generated software may ship with weaker security practices or novel bug patterns.

Our scenarios hold the storm-per-CVE rate roughly constant as a simplifying assumption. A few different things could shift the storm-per-CVE rate. For example, AI-discovered bugs may skew toward lower-severity flaws, which would push the rate down.

A guest post by

Christopher

Cyber and AI policy

Discussion about this post

Ready for more?