Not yet Dariopilled
Anthropic CEO Dario Amodei is somewhat revered around these parts (although I suspect most people had never heard of him until last week). Anthropic pride themselves on being the safety-focused frontier AI company (they refer to themselves as an “AI safety and research company”).
Nonetheless, like all frontier labs, the siren song of filthy lucre is incredibly strong (and the recent AI buildout is the largest deployment of capital in human history and it must be terribly hard to resist its pull). At its inception, Anthropic gave early investors and the EA community the strong impression that it would not push the frontier of AI capability, instead releasing less powerful models - which their leadership has since walked back as “not an explicit policy” and more of a strategic commitment.11. They’ve also rolled back the commitments they made in earlier versions of their Responsible Scaling Policy. Even if that were the case it’s hard to see the safety argument for releasing the most powerful models and accelerating the race to AGI.
Anthropic is aggressively pushing the frontier, with Opus 4.5 (which pretty much transformed the agentic coding landscape overnight), then Opus 4.6, Sonnet 4.6 (which gives Opus 4.5-like performance at much lower cost), and their much-loved22. Including by me, as regular readers will know. Claude Code agentic coding tool.
Dario’s latest essay, released a few weeks ago, focused on the risks of powerful AI. I wasn’t much convinced. I agree with him on the risks, but the conclusion seems to be that ‘this will all be a great test for humanity, but we will prevail as we always do’:
Despite the many obstacles, I believe humanity has the strength inside itself to pass this test... I have seen enough courage and nobility to believe that we can win—that when put in the darkest circumstances, humanity has a way of gathering, seemingly at the last minute, the strength and wisdom needed to prevail. We have no time to lose.
Which feels like a florid and motivating way to say ‘it will be very difficult but in the past we’ve muddled through, so hopefully that happens this time too!’ The article sort of reads as EA propaganda, written precisely to trigger the right responses in those already sympathetic to Anthropic but hostile to the other frontier labs.
His latest communications have been focused on the battle between Anthropic and the US Department of War. Others have written very cogently on this, so I won’t repeat what they’ve said too much, but broadly the DoW wanted to be able to use Anthropic’s models in any way they saw fit, but Anthropic wanted two redlines drawn - preventing their models being used for mass surveillance or for making kill decisions for autonomous weapons. Both seem pretty reasonable.
Perhaps too reasonable. The statement that Anthropic published last week is a bit more nuanced - they actually don’t want kill decisions being made yet because the models aren’t good enough. And they want to block mass surveillance of Americans domestically - any comment on the global mass surveillance of everyone, as exposed by Snowden et al in the 2010s, is notably absent.
In his appearances on TV over the last few days, Dario emphasises that at Anthropic they are good patriots:
We are patriotic Americans. ... Everything we have done has been for the sake of this country, for the sake of supporting U.S. national security. Our leaning forward in deploying our models with the military was done because we believe in this country.
We believe in defeating our autocratic adversaries. We believe in defending America. The red lines we have drawn we drew because we believe that crossing those red lines is contrary to American values. And we wanted to stand up for American values.
I wonder how the more EA-leaning folk at Anthropic reflect on this stance - a frontier lab not espousing the type of humanist/globalist values implied by Claude’s constitution, but one that’s happy to provide models with reduced guardrails to the US Government to “defeat its adversaries”. Eep.
The reasonability of Anthropic and the subsequent blacklisting of Anthropic by the US Government (through designating it as a supply chain risk), such that no government agency can use Claude at all, has caused a groundswell of public support for Anthropic. OpenAI swooped in to take the contract that Anthropic lost, with apparently the same redlines, but actually written in a way which simply deferred to the current laws/policies of the US, i.e. without concrete redlines.
OpenAI have subsequently said they will add those redlines in explicitly, but their rep has been seriously hurt over this.
This whole episode makes me much less bullish on the “leading from the front” theory that an AI safety friend recently suggested to me: the idea that a frontier lab could still be good if they put a strong emphasis on safety, in the hopes that such a focus raises the bar for other companies.
This is a classic collective action problem. No firm can lead from the front when defection is costless for its competitors. Anthropic advanced the frontier, encouraging competitors to improve their models to keep up, but when they took a firm stance against USG on safety, said competitors just jumped in to take the contracts over.
If reputational damage isn’t a sufficient enforcement mechanism in a market this lucrative then ‘leading from the front’ isn’t a viable strategy. It remains to be seen what the long-term effect on OpenAI’s success as a business will be from the whole debacle, and indeed the nature of the chilling effect from USG preposterously designating an AI lab a supply chain risk.33. Worth noting that it’s probably very difficult for AI companies to leave the US even if they wanted to.