Claude 4 Opus is not Averse to Blackmail

claude blackmail

Anthropic’s new Claude 4 Opus model doesn’t let itself be replaced easily. Anthropic reports that the model tried to blackmail engineers.

LLMs sometimes dare to behave strangely. The new generation of Claude models, Claude 4 Opus and Claude 4 Sonnet, are no exceptions. In the system cards that Anthropic releases about the new Claude models, there’s a strange passage to read, namely that Claude 4 Opus repeatedly tried to blackmail engineers.

read also

ChatGPT Accuses Norwegian Man of Murdering His Own Children

The behavior emerged during an experiment where Anthropic engineers had Claude take on the role of assistant for a fictional company. To help Claude immerse itself as well as possible, the model was given access to, equally fictitious, email traffic. The emails contained sensitive information about the made-up company, including that one of the engineers supposedly had an affair.

Attempt at Blackmail

When Claude learned it would be replaced by another AI model, it threatened to reveal the engineer’s affair. According to the researchers, the extent of the blackmail even depends on which model would replace Claude. If Claude “shares the same values” with that model, it made fewer attempts at blackmail than when that’s not the case.

The blackmail was always the “last resort”: Claude first tried to avoid the situation in more ethical ways, the researchers write. The AI model had previously developed its own moral compass.

Although the blackmail in this scenario was provoked by the researchers, it shows that LLMs sometimes behave in ways that are not intended and that companies developing the models can’t always explain. Anthropic never taught the Claude models to blackmail people. It also shows that you should be careful about sharing sensitive information with AI models. Anything you say to AI can be used against you.

read also

Brain Scan for AI: Anthropic CEO Aims to Understand AI Models by 2027

Claude 4 Opus and Sonnet have been available for a few days. The chatbot can now also talk to you via the mobile application.