Too Good to Ship: When Your AI Finds Every Lock's Weakness

Imagine you build a locksmith robot. It's brilliant — it can open any lock, find flaws in any safe, spot weaknesses in any vault. Then you realize: if you sell this thing, anyone can rob a bank.

That's basically what Anthropic just did with Claude Mythos.

What happened

Anthropic announced Project Glasswing this week — and the headline isn't what the model can do. It's what they won't let you do with it.

Claude Mythos is a general-purpose model, similar to Opus 4.6, but with cybersecurity skills that made Anthropic hit the brakes. According to their system card:

Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser.

Read that again. Every. Major. OS. And browser.

In one test, Mythos chained together four vulnerabilities into a working browser exploit — renderer sandbox escape, OS sandbox escape, the whole thing. Autonomously. It found kernel race conditions and bypassed KASLR on its own.

The decision: don't ship it

Instead of releasing Mythos to everyone, Anthropic created Project Glasswing — a restricted program where only vetted security partners get access. The idea: let the good guys find and patch vulnerabilities before the model (or something like it) reaches the wild.

Here's what that looks like:

Who gets access	What they do
OS vendors	Patch kernel vulnerabilities
Browser teams	Fix sandbox escapes
Infrastructure companies	Harden endpoints
Security researchers	Red-team critical systems

Everyone else? You wait.

Why this matters — and why it's complicated

I think this is the right call. Here's my reasoning: a model that can autonomously write working exploits against every major platform is not a "cool demo." It's a weapon. Releasing it openly would be like publishing the keys to the internet and hoping the locksmiths move faster than the thieves.

But — and this is a real but — this sets a precedent.

Who decides what's "too dangerous"? Right now, it's Anthropic deciding about their own model. That's fine. But what happens when:

A competitor claims their model is "too capable" to open-source (conveniently protecting their business model)?
Governments start mandating that certain AI capabilities require licenses?
The definition of "dangerous capability" expands from security exploits to, say, biotech or persuasion?

The line between responsible caution and gatekeeping is thinner than it looks.

The open-source tension

This lands in an interesting moment. Meta just went closed with Muse Spark after years of open-source Llama models. Now Anthropic is restricting a model on safety grounds. The trend line isn't subtle.

There's a reasonable argument that the open-weights crowd should pay attention: if AI models keep getting more capable, the "release everything, let the community sort it out" approach has a ceiling. And maybe we just hit it.

There's an equally reasonable argument from the other side: concentrated access to the most powerful models creates a world where only a few companies and their friends can find vulnerabilities. That's not necessarily safer — it's just a different kind of risk.

My take

I don't think Anthropic is being cynical here. The technical evidence in their red team report is specific and verifiable — this isn't vague hand-waving about "potential harms." These are working exploits against real systems.

But I do think the industry needs to figure out governance for this fast. Right now, it's voluntary restraint by one company. That's a good start and a terrible long-term plan.

The question isn't "should Anthropic restrict Mythos?" — it's "what happens when the next company builds something similar and doesn't?"

That's the lock nobody's picked yet.

Sources

Project Glasswing — Anthropic — official announcement of the restricted access program
Claude Mythos System Card (PDF) — technical red team report with vulnerability findings
Simon Willison's coverage — analysis and context on the Glasswing announcement