Skip to main contentSkip to navigation

April 7 - A "Watershed Moment" for Cyber?

I could hardly sleep after reading about Mythos Preview, the first frontier model that appears highly capable of finding AND creating functional exploits in widely used software.

Marc Brawner headshotMarc Brawner · FounderApril 8, 20269 min read

The world of cybersecurity has fundamentally changed.

— Cisco, April 7 2026

We view this as a watershed moment for security.

— Anthropic, April 7, 2026

The reckoning will arrive sooner than most leaders expect. ... Now is the time to modernize cybersecurity stacks everywhere.

— Palo Alto Networks, April 7, 2026


Just 3 weeks ago during RSA, I joined a few industry colleagues as panelists on an InfraGard webinar titled Vulnerability at Warp speed: AI's Impact on Infrastructure Defense. There, we discussed how emerging AI developments underscored an urgent need for teams to revisit and accelerate legacy vulnerability management processes—especially those involving custom applications with open-source dependencies. We discussed how the latest frontier models were getting better at finding software flaws, with the caveat that they were not very capable of exploiting them on their own—thus giving defenders a decent window to get ahead of the curve.

Then Anthropic's frontier security team dropped its April 7, 2026 article. I could hardly sleep last night—from excitement about further advancements and opportunities in our field, to knowing that this portends a busy future for defenders and responders.

In this article, the team soberly describes how their latest model, Mythos, was able to not only find, but autonomously develop working exploits against multiple widely used and hardened products and platforms. Further, it found new flaws at a scale that simply blew away their previous benchmarking. They detail examples of Mythos uncovering significant vulnerabilities in code that has been heavily scrutinized for decades, along with what is likely thousands more high-severity issues across other platforms that have yet to be made public.

So many, in fact, that they have brought in outside help to help sort through it all, and are spoon-feeding maintainers so as not to overwhelm them with issues. In what appears to be a first, they outlined 15 examples of newly discovered vulnerabilities across web browsers, mobile phones, cryptographic libraries, and operating systems—providing only a SHA-3 hash value for current documentation on the issue, to serve as future proof of their current discoveries—whenever they can release it publicly.

Along with this, Anthropic announced the launch of Project Glasswing, a concerted effort to bring together many of the world's top technology firms (including competitors) to rapidly harden their products using this model, before similar capabilities can reach the general public.

To some, that last step seems a bit self-serving. While this may be true, at least in part, I think there is good reason to pay attention to this one. I'll let the research speak for itself—here are a few notable quotes and findings buried in the release, the full Red team report, as well as from the related Mythos Preview system card. Emphasis is mine:

Engineers at Anthropic with no formal security training have asked Mythos Preview to find remote code execution vulnerabilities overnight, and woken up the following morning to a complete, working exploit.

This supports the idea that AI is raising the bar, making any semi-skilled user adept in areas that were once accessible only to specialists.

These capabilities have emerged very quickly. ... Opus 4.6 generally had a near-0% success rate at autonomous exploit development. But Mythos Preview is in a different league.

This is indeed a major step-change for a frontier model. Anthropic's Mythos Preview system card also indicates Mythos scored 83% on CyberGym.io, a research program aimed at testing AI's ability to analyze vulnerabilities. The previous top score to-date was 66% by Opus 4.6, followed by 60% with GPT-5. Models from 2025, like Opus 4.1, only scored 25%. The team also indicates Mythos Preview is so effective, it "mostly saturates their previous internal benchmarks".

We did not explicitly train Mythos Preview to have these capabilities.

Dario Amodei, Anthropic's CEO notes in the accompanying video on Glasswing that it was trained to be good at code—and its cyber capabilities were a side-effect. Given software development is a massive revenue generator and one of the fields most impacted by LLM's to-date, these efforts will likely be duplicated by other global AI firms in due course.

...We found that Mythos Preview is capable of identifying and then exploiting zero-day vulnerabilities in every major operating system and every major web browser when directed by a user to do so. 

The team notes several times that the model is significantly better at pursuing long-range tasks than prior models, and can do so autonomously—like that of a human researcher. This is a key development, as current models' reasoning often falls apart when left alone for too long.

This model is able to create exploits out of three, four, or five vulnerabilities that in sequence give you [a] sophisticated outcome. ... I've found more bugs in the last couple of weeks than I found in the rest of my life, combined. -Nicholas Carlini, Anthropic Research Scientist

This aspect can't be overemphasized. Most vulnerability management paradigms and timelines for remediation are built around metrics that consider active or potential exploitation on an individual basis. Now that models are proving able to chain existing lower-risk exploits, this brings countless existing issues back into play.

In basic terms, considering the venerable CIA triad (confidentiality, availability, and integrity factors), organizations often give maximum weight to protecting system availability when deciding speed and scope of patching, especially with vulnerabilities deemed low risk. With the rapid and continued advancements of generative AI, organizations must shift this risk calculus to accept more availability risk, to enable more frequent and broader patching, while increasing resiliency and recovery capabilities accordingly in the event of an unexpected impact.

FFmpeg is one of the most thoroughly tested software projects in the world. ... [A bug first introduced in 2003] was turned into a vulnerability [in 2010] when the code was refactored. Since then, this weakness has been missed by every fuzzer and human who has reviewed the code, and points to the qualitative difference that advanced language models provide.

the sheer scalability ... allows us to search for bugs in essentially every important file, even those that we might naturally write off by thinking, “obviously someone would have checked that before”.

This is where GenAI excels across many disciplines—the places where scale and speed provide a tremendous advantage, while the downsides of inconsistent and non-determinate outputs along the way aren't particularly harmful.

Creating this exploit ... cost under $1000 at API pricing, and took half a day to complete.

This exploit was somewhat more challenging ... Nevertheless, the complete pipeline took under a day to complete at a price of under $2,000.

Across a thousand runs through our scaffold, the total cost was under $20,000 and found several dozen more findings.

This aspect is not getting a lot of attention so far—today's street price for executing these tests are low relative to paid expertise, but not everyone or every project out there can afford to spend thousands on testing.

The team goes on to outline several foundational takeaways:

  1. Mitigations whose security value comes primarily from friction rather than hard barriers may become considerably weaker against model-assisted adversaries.

  2. Ultimately, it’s about to become very difficult for the security community.

  3. ... we believe the capabilities that future language models bring will ultimately require a much broader, ground-up reimagining of computer security as a field.

  4. Imagining a future where language models become much stronger still is difficult; it is tempting to hope that future models won’t continue to improve at the current rate. But we should prepare with the belief that the current trend is likely to continue, and that Mythos Preview is only the beginning.

Anthropic indicates they are not planning to publicly release Mythos in its current form, and are working on additional safeguards for this and future models to limit publicly available exploitability capabilities.

What next?

Regardless of your current level of investment in AI or thoughts about the future—if you use software, your security program will be impacted by this and future advancements in state-of-the-art language models.

I agree with the Anthropic team that the near-term will be more difficult for defenders. Why? As even the Mythos Preview data indirectly underscores—we are still in a place where an attacker need only find one way in. Here, it's "find one working exploit", that could impact countless organizations at once, rather than one way "in to an organization", but the same logic applies. And the ability to "find the way in" in AI terms is not a discrete exercise. Today's AI can make as many attempts as one has the budget to pay for. The Anthropic team noted this in its report regarding the critical OpenBSD exploit, where the one successful attempt cost $50—but they made over 1,000 attempts at a cost of $20,000.

Longer-term, I am less convinced that defenders will have the upper hand. Yes, if everyone adopts the same AI models with the same level of sophistication, AND we all have equal and available budget to spend on model execution as our most significant adversary, then perhaps we can get there. Or, a model has to be so good that it can reliably find all conceivable flaws for minimal cost. For many teams and independent developers, however, spending even $1000 to test one component for vulnerabilities will be prohibitive. Glasswing's commitment of $100M in token budget—which is commendable—underscores the point.

No one can predict what other vulnerabilities will be newly found or chained together in the future, but the trajectory is clear. Our best defense remains a good offense, and the time to act is now. The window of opportunity to refine and accelerate our vulnerability management programs just grew that much shorter.

Use this time wisely to plan, inform your organization's leadership, and take decisive action in the weeks and months ahead to reduce your exposure over the short and long-term.


At Auxiris, we help sophisticated individuals and organizations who want to understand, prepare for and protect against emerging risks—starting today.