Why Anthropic’s new model has cybersecurity experts rattled

The company says it has built its most dangerous model yet. Can its coalition of internet companies fix the internet before others catch up?

Why Anthropic’s new model has cybersecurity experts rattled
Some of the members of Project Glasswing (Anthropic)

This is a column about Anthropic and AI. My fiancé works at Anthropic. See my full ethics disclosure here.

Two weeks ago, Anthropic accidentally leaked the existence of what the company said was its most powerful artificial intelligence to date: a new model, known as Claude Mythos Preview, that represented “a step change” in AI performance. In particular, according to a blog post that leaked due to human error and a misconfigured content management system, Mythos posed serious new risks to cybersecurity. “It presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders,” the blog post stated.

On Tuesday, the wave crashed onto the shore. Anthropic announced Mythos alongside Project Glasswing, an initiative with more than 40 of the world’s biggest tech companies that will see Anthropic grant early access to the model to find and patch vulnerabilities across many of the world’s most important systems. Launch partners in the coalition include Apple, Google, Microsoft, Cisco and Broadcom.

They’ll be tasked with scanning and patching their own systems along with the critical open-source systems that modern digital infrastructure depends on. Anthropic is giving participants $100 million in usage credits for Mythos, and donating another $4 million to open-source security efforts.

Still, today marks a striking and mostly unsettling moment in the development of AI systems. One of the world’s three frontier labs has now created a model it says is too dangerous to release to the general public. These dangers emerged not from any specialized cyber training but from the same general improvements that every other lab is currently pursuing. As a result, models with similar capabilities may soon be accessible to criminals, hackers, and nation states — or even more broadly via open source models.

Already, Anthropic said, the model has found thousands of high-severity vulnerabilities in every major operating system and web browser, and in many cases developed related exploits. Among them: a vulnerability in OpenBSD, a security-focused open source operating system, that had escaped detection for 27 years; another flaw in the video encoder FFmpeg that had escaped detection in 5 million previous automated tests; and “several” vulnerabilities in the Linux kernel, which could be exploited to take complete control of a user’s machine.

“Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely,” the company wrote. “The fallout — for economies, public safety, and national security — could be severe. Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes.” 

In a video that Anthropic made to accompany the announcement, researchers say that Mythos is more dangerous largely due to its advanced reasoning capabilities. While current models are capable of identifying high-severity vulnerabilities, Mythos might identify five separate vulnerabilities in a single piece of software and then chain them together into a uniquely dangerous new attack. Coupled with models’ growing ability to work without supervision for extended periods of time, Anthropic said we have reached an inflection point in cybersecurity risks. 

Of course, AI labs have often been criticized for making ominous pronouncements about the dangers posed by their own work, which can come across as a strange new form of marketing hype. For that reason, along with the fact that my fiancé works at Anthropic, I wanted to see what other cybersecurity experts made of the Mythos announcement. 

Alex Stamos, chief product officer at cybersecurity firm Corridor, told me that Glasswing is “a big deal, and really necessary.”

“We only have something like six months before the open-weight models catch up to the foundation models in bug finding,” said Stamos, who previously led security at Facebook and Yahoo. “At which point every ransomware actor will be able to find and weaponize bugs without leaving traces for law enforcement to find (and with minimal cost).”

Stamos’ sentiments were broadly echoed by Glasswing participants.

“AI capabilities have crossed a threshold that fundamentally changes the urgency required to protect critical infrastructure from cyber threats, and there is no going back,” Anthony Grieco, chief security and trust officer at Cisco, said in a statement accompanying the announcement.

If critical infrastructure really is at risk, as Grieco suggests, then you would hope the US government is paying attention. (And right on cue, here’s a story from today about Iran successfully hacking US water and energy utilities.)

Awkwardly, though, the US government attempted to declare Anthropic a supply chain risk after it refused to modify its contract with the Pentagon to permit mass domestic surveillance and fully autonomous weapons. A judge has blocked that designation from taking effect while the case is litigated.

Anthropic told me that before launching Project Glasswing, it briefed senior US government officials about Mythos’ capabilities, both offensive and defensive. That includes the Cybersecurity and Infrastructure Security Agency and the Center for AI Standards and Innovation, which works with the industry to test new models and evaluate them for security risks. 

The company told me it has signaled to the government that it is available to help the government with evaluating Mythos. But it’s not clear the government is taking Anthropic up on the offer.

A functioning government would take a strong interest in what Anthropic is up to here, if only out of self-preservation. We simply don’t know whether Project Glasswing will be enough to protect critical systems from being breached — and for how long.

“The optimistic timeline is that we are one step past human capabilities, and that means that there is a huge but finite pool of flaws that can be found and fixed,” Stamos told me. “The pessimistic timeline is that with every release there will be new classes of flaws we never even imagined. It’s hard to predict, because we are trying to model superhuman thinking.”

For the moment, there's a case to be made that Project Glasswing represents Anthropic's founding thesis in action. The whole reason the company set out to build frontier AI models was so that a safety-focused lab would be the first to encounter the most dangerous capabilities — and could lead the way in mitigating them. With Mythos, that appears to be exactly what’s happening.

At the same time, Glasswing is built on a deeply uncomfortable premise — that the only way to protect us from dangerous AI models is to build them first. And Anthropic is doing so in an environment that is barely regulated at all, at the near-insistence of the Trump administration. 

One effect of this is to centralize power. (“An underrated feature of this situation,” observed Kelsey Piper today about Mythos: “a private company now has incredibly powerful zero-day exploits of almost every software project you've heard of.”) Another effect is to centralize risk: Among other things, the incentives to steal Anthropic’s model weights just went up significantly. 

None of which is likely to make AI more popular in a country that appears to be turning against it. Surveys show people are clamoring for more control over how AI is used and stronger safeguards around it. As the story of Project Glasswing plays out, we may regret not beginning that work much sooner.


Elsewhere in Mythos: A striking new benchmark result noted by VentureBeat: "Mythos Preview achieves 93.9% on SWE-bench Verified, versus 80.8% for Opus 4.6." That's a near 13-percent jump over the previous state of the art since February.

Following

People are yelling about tokenmaxxing


What happened: Meta has an internal leaderboard called “Claudeonomics,” The Information reports, ranking over 85,000 employees on their AI usage. Users who burn the most tokens can earn titles including “Session Immortal,” “Cache Wizard,” and “Token Legend.”

Employees are running coding agents continuously in hopes of landing a coveted spot in the top 250. (The top individual user at Meta spent 281 billion tokens last month.) A “token” is a chunk of information inputted or outputted by an LLM, roughly equivalent to one word. Which means that one Meta employee’s poor agents generated six times more tokens than are contained in the entirety of Wikipedia in all languages.

Over a recent month, total token usage on “Claudeonomics” topped 60 trillion. Had these tokens all been from one of the more expensive recent models, Claude Opus 4.6, this would’ve been a $900 million expense, although we hope they’re sometimes substituting for more economical models.

The news generated a bunch of X chatter. Onlookers are wondering if this is really a good metric for work at the company — or if Meta is burning a ton of money for the sake of productivity showboating.

Why we’re following: This is only the latest account of tech workers competing to use ever more tokens. It reflects both how much AI agents are actually boosting coders’ productivity, and the anxiety that only “Cache Wizards” will escape the permanent underclass.

At Meta, the high token spend and goofy leaderboard also underscore the expensive, flashy, somewhat chaotic efforts the company has made to catch up in AI.

What people are saying: On X, New York Times reporter Mike Isaac posted that after the recent conversation, a product growth director at Meta circulated an internal memo titled “token usage is NOT impact.” One line from the memo: “we’re talking about token usage and skill counts when we should be celebrating outcomes.”

Roblox’s product lead Peter Yang was skeptical of the tokenmaxxing approach: “Measuring productivity by token usage sounds almost as dumb as measuring by lines of code written.”

Software engineering blogger Gergeley Orosz pointed out that we’ve known AI use is part of Meta’s performance evaluations for a little while now. “This is just smart people (Meta only hires smart folks) hitting targets they assume leadership wants them to hit so they get that exceeds expectations (or above) rating.”

University of Chicago economics professor Alex Imas posted, “Focusing on the input and not the output is literally the most Meta thing to do.”

Ella Markianos

Side Quests

An Indianapolis city councilor said someone fired 13 shots at his home and left a note that said “NO DATA CENTERS.”

Anthropic is reportedly planning to invest $200 million in a new PE venture that would sell AI tools to their portfolio companies. The company hired Microsoft's Eric Boyd as head of infrastructure.

A conversation with OpenAI president Greg Brockman on the company’s research direction, Codex, and LLMs. OpenAI opened applications for the OpenAI Safety Fellowship, its new program for researchers and others looking to pursue AI safety-focused research.

Jeff Bezos's new lab Project Prometheus has reportedly poached xAI cofounder Kyle Kozic from OpenAI.

Hackers with ties to Russia are targeting routers to gain access to passwords, the UK warned.

Licensing talks between Universal Music and Suno have reportedly stalled in recent months.

Tax experts are stumped on how to file taxes for wins from prediction market bets. Kalshi struck a deal with Fox Corp to integrate its forecasts into Fox channels. Finally, a Kalshi partnership that makes sense.

Intel is joining Elon Musk’s Terafab AI chip project. Musk amended his lawsuit against OpenAI to say that if he wins he wants the proceeds to go to OpenAI's nonprofit arm.

Apple is reportedly experiencing setbacks with engineering for its first-ever foldable iPhone — but it’s still on track for a September debut, sources told Bloomberg.

Google added mental health features to Gemini following multiple lawsuits.

SEO agencies are rushing to cash in on the AI boom by claiming they can help brands be cited by AI. A look at how easily Google’s AI Overviews can be manipulated.

Spotify is expanding its Prompted Playlist feature to include podcasts.

An interview with Upscrolled founder Issam Hijazi on how he’s catching up to the social platform’s rapid growth.

AI dolls are filling in the gaps in South Korea’s strained social care system by offering companionship to the elderly. 

The MLB’s robo-umps aren’t accurate enough to replace human umpires yet.

Those good posts

For more good posts every day, follow Casey’s Instagram stories.

(Link)

(Link)

(Link)

Talk to us

Send us tips, comments, questions, and Linux kernel exploits: casey@platformer.news. Read our ethics policy here.