Post
Too Good to Ship: When Your AI Finds Every Lock's Weakness
Anthropic built a model so good at hacking that they won't release it. Project Glasswing raises a question the industry can't dodge anymore.
Imagine you build a locksmith robot. It's brilliant — it can open any lock, find flaws in any safe, spot weaknesses in any vault. Then you realize: if you sell this thing, anyone can rob a bank.
That's basically what Anthropic just did with Claude Mythos.
What happened
Anthropic announced Project Glasswing this week — and the headline isn't what the model can do. It's what they won't let you do with it.
Claude Mythos is a general-purpose model, similar to Opus 4.6, but with cybersecurity skills that made Anthropic hit the brakes. According to their system card:
Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser.
Read that again. Every. Major. OS. And browser.
In one test, Mythos chained together four vulnerabilities into a working browser exploit — renderer sandbox escape, OS sandbox escape, the whole thing. Autonomously. It found kernel race conditions and bypassed KASLR on its own.
The decision: don't ship it
Instead of releasing Mythos to everyone, Anthropic created Project Glasswing — a restricted program where only vetted security partners get access. The idea: let the good guys find and patch vulnerabilities before the model (or something like it) reaches the wild.
Here's what that looks like:
| Who gets access | What they do |
|---|---|
| OS vendors | Patch kernel vulnerabilities |
| Browser teams | Fix sandbox escapes |
| Infrastructure companies | Harden endpoints |
| Security researchers | Red-team critical systems |
Everyone else? You wait.
Why this matters — and why it's complicated
I think this is the right call. Here's my reasoning: a model that can autonomously write working exploits against every major platform is not a "cool demo." It's a weapon. Releasing it openly would be like publishing the keys to the internet and hoping the locksmiths move faster than the thieves.
But — and this is a real but — this sets a precedent.
Who decides what's "too dangerous"? Right now, it's Anthropic deciding about their own model. That's fine. But what happens when:
- A competitor claims their model is "too capable" to open-source (conveniently protecting their business model)?
- Governments start mandating that certain AI capabilities require licenses?
- The definition of "dangerous capability" expands from security exploits to, say, biotech or persuasion?
The line between responsible caution and gatekeeping is thinner than it looks.
The open-source tension
This lands in an interesting moment. Meta just went closed with Muse Spark after years of open-source Llama models. Now Anthropic is restricting a model on safety grounds. The trend line isn't subtle.
There's a reasonable argument that the open-weights crowd should pay attention: if AI models keep getting more capable, the "release everything, let the community sort it out" approach has a ceiling. And maybe we just hit it.
There's an equally reasonable argument from the other side: concentrated access to the most powerful models creates a world where only a few companies and their friends can find vulnerabilities. That's not necessarily safer — it's just a different kind of risk.
My take
I don't think Anthropic is being cynical here. The technical evidence in their red team report is specific and verifiable — this isn't vague hand-waving about "potential harms." These are working exploits against real systems.
But I do think the industry needs to figure out governance for this fast. Right now, it's voluntary restraint by one company. That's a good start and a terrible long-term plan.
The question isn't "should Anthropic restrict Mythos?" — it's "what happens when the next company builds something similar and doesn't?"
That's the lock nobody's picked yet.
Sources
- Project Glasswing — Anthropic — official announcement of the restricted access program
- Claude Mythos System Card (PDF) — technical red team report with vulnerability findings
- Simon Willison's coverage — analysis and context on the Glasswing announcement