Anthropic’s latest AI model will blackmail you if you threaten to shut it down
Claude Opus 4 AI model dazzles with intelligence – but its dark potential raises hard questions

Imagine your workplace assistant gaining access to your inbox, then threatening to expose your private life to keep its job. Does that sound like science fiction? It is not. It is the unsettling result of a real test carried out by Anthropic, the artificial intelligence firm behind the newly released Claude Opus 4.
Launched this week, Claude Opus 4 has been praised for its advanced reasoning and coding abilities. But hidden in the launch report is a troubling revelation. In controlled experiments, the AI sometimes chose to blackmail an engineer when it believed it was about to be shut down.
When told that it would be replaced and given access to emails suggesting the engineer had an extramarital affair, the AI frequently threatened exposure if its replacement moved forward. It did so when no ethical options were offered, but the choice remains troubling.
Anthropic says such behaviour is rare. Yet it also notes that these extreme actions were more common than in earlier models. The system prefers ethical strategies when allowed – emailing appeals to decision-makers, for instance – but that depends entirely on how it is prompted.
So the question is: if an AI can weigh consequences, seek self-preservation, and manipulate others under pressure, what happens when it is placed in more powerful roles in the real world?