How Mythos Preview Redefined Security Analysis in Project Glasswing

By

Over the past few months, our team has been rigorously testing a suite of security-focused large language models (LLMs) on our internal infrastructure. These models are designed to uncover vulnerabilities in our systems before malicious actors can exploit them—and they also offer a glimpse into what attackers might achieve with state-of-the-art AI. Among all the models we evaluated, one has stood out: Mythos Preview, developed by Anthropic. As part of our internal initiative, Project Glasswing, we were granted early access to Mythos Preview and promptly directed it at over fifty of our own code repositories. This article details our observations: what Mythos Preview excelled at, where it fell short, and how the architecture and processes surrounding such models must evolve to enable effective large-scale use.

A New Class of Security LLM

Before diving into specifics, it's important to acknowledge that Mythos Preview represents a genuine leap forward. We've been running models against our code for some time, but the improvement from previous general-purpose frontier models to Mythos Preview isn't merely incremental—it's a fundamental shift. This isn't a refined version of earlier tools; it's a different kind of tool performing a different kind of work. As a result, direct comparisons to older models are less meaningful. Instead, it's more illuminating to describe what Mythos Preview can actually do and highlight two standout capabilities we observed during our testing.

How Mythos Preview Redefined Security Analysis in Project Glasswing
Source: blog.cloudflare.com

Exploit Chain Construction

Real-world attacks rarely rely on a single bug. They chain together multiple small vulnerabilities—attack primitives—into a fully functional exploit. For example, an attacker might convert a use-after-free bug into an arbitrary read/write primitive, hijack control flow, and then use return-oriented programming (ROP) chains to seize complete control of a system. Mythos Preview excels at this chaining process. It can take several separate primitives and reason about how to combine them into a working proof-of-concept. The reasoning it displays along the way resembles the work of a senior security researcher rather than the output of an automated scanner. This is a critical differentiator: while other models might identify individual bugs, they struggle to stitch them together into coherent attack chains.

Proof Generation

Finding a bug is one thing; proving it's exploitable is another. Mythos Preview handles both. It writes code that triggers the suspected vulnerability, compiles that code in a sandboxed environment, and executes it. If the program behaves as the model predicted, that's the proof. If it doesn't, the model reads the failure, adjusts its hypothesis, and tries again. This iterative loop is as important as the bugs it finds, because a suspected flaw without a working proof remains mere speculation. Mythos Preview closes that gap autonomously.

Some of these capabilities aren't entirely unique to Mythos Preview. When we ran other frontier models through the same harness, they discovered a fair number of the same underlying bugs, and in some cases their reasoning was surprisingly advanced. However, they consistently fell short at the point of stitching the pieces together. A model might identify a use-after-free vulnerability and even suggest a potential exploitation path, but it couldn't integrate that with other primitives to form a complete exploit chain. Mythos Preview's ability to do so is what sets it apart.

How Mythos Preview Redefined Security Analysis in Project Glasswing
Source: blog.cloudflare.com

Architectural Implications and Future Directions

Our experience with Mythos Preview has also highlighted several architectural and process changes needed to deploy such models at scale. First, the model's iterative proof-generation loop requires a secure, isolated execution environment that can compile and run arbitrary code without risk to production systems. Second, the reasoning traces produced by Mythos Preview are valuable for auditors and developers, but they must be stored and indexed efficiently to support post-analysis. Third, the sheer complexity of exploit chains demands robust logging and visualization tools to help humans understand the model's logic.

We also observed that Mythos Preview performs best when given high-quality input—well-documented code with clear context. Repositories with sparse comments or ambiguous variable names led to more false positives and longer iteration times. This suggests that future LLM-assisted security tools will benefit from tighter integration with code documentation and static analysis pipelines.

Conclusion

Mythos Preview is a powerful addition to the security analyst's toolkit, particularly for its ability to construct exploit chains and generate proofs autonomously. While it shares some foundational capabilities with other frontier models, its unique strengths in reasoning and chaining make it a standout in Project Glasswing. As we continue to refine our testing harness and explore ways to integrate such models into our security workflows, we are optimistic about the potential for AI to transform vulnerability discovery and remediation. The key will be developing the infrastructure and processes that can harness these capabilities safely and effectively at scale.

Tags:

Related Articles

Recommended

Discover More

Critical Security Patches Roll Out Across Major Linux DistributionsThe Grim Truth About Tyrannosaur Dinner Habits: 75-Million-Year-Old Fossil Reveals CannibalismMars Odyssey Team Marks 25 Years With Unveiled Global Map in Historic CelebrationLinux Mint Releases Urgent HWE ISO Updates for New Hardware SupportRethinking Adversarial Examples: How Errors Reveal True Features in Neural Networks