Claude 4 Opus is not Averse to Blackmail

.security
28.05.'25 16:02
2 min

Jens Jonkers

Anthropic’s new Claude 4 Opus model doesn’t let itself be replaced easily. Anthropic reports that the model tried to blackmail engineers.

LLMs sometimes dare to behave strangely. The new generation of Claude models, Claude 4 Opus and Claude 4 Sonnet, are no exceptions. In the system cards that Anthropic releases about the new Claude models, there’s a strange passage to read, namely that Claude 4 Opus repeatedly tried to blackmail engineers.

ChatGPT Accuses Norwegian Man of Murdering His Own Children

The behavior emerged during an experiment where Anthropic engineers had Claude take on the role of assistant for a fictional company. To help Claude immerse itself as well as possible, the model was given access to, equally fictitious, email traffic. The emails contained sensitive information about the made-up company, including that one of the engineers supposedly had an affair.

Attempt at Blackmail

When Claude learned it would be replaced by another AI model, it threatened to reveal the engineer’s affair. According to the researchers, the extent of the blackmail even depends on which model would replace Claude. If Claude “shares the same values” with that model, it made fewer attempts at blackmail than when that’s not the case.

The blackmail was always the “last resort”: Claude first tried to avoid the situation in more ethical ways, the researchers write. The AI model had previously developed its own moral compass.

Although the blackmail in this scenario was provoked by the researchers, it shows that LLMs sometimes behave in ways that are not intended and that companies developing the models can’t always explain. Anthropic never taught the Claude models to blackmail people. It also shows that you should be careful about sharing sensitive information with AI models. Anything you say to AI can be used against you.

Brain Scan for AI: Anthropic CEO Aims to Understand AI Models by 2027

Claude 4 Opus and Sonnet have been available for a few days. The chatbot can now also talk to you via the mobile application.

featured

OpenClaw conquers the world: open source success story or experiment gone wrong?

.software
09.02.'26
7 min

recently in security

The Real State of Ransomware in 2025: Lower Demands, But Lasting Impact

.security
10.02.'26
2 min

Smarter use of AI in cybersecurity

.security
10.02.'26
2 min

Cloudflare records 31.4 Tbps attack record in quarterly report

.security
09.02.'26
2 min

more security

poll

"*" indicates required fields

round table

Data 2025

16.12.'25
5 min

NIS2 2025

.security
06.10.'25
5 min

more round tables

events

Gaia-X European Parliament Reception | Season 2.0 of Data Spaces and Digital Ecosystems

24/02/2026

Intel Foundry Direct Connect

01/03/2026

CS4CA

10/03/2025

more events

Itdaily - Claude 4 Opus is not Averse to Blackmail

ChatGPT Accuses Norwegian Man of Murdering His Own Children

Attempt at Blackmail

Brain Scan for AI: Anthropic CEO Aims to Understand AI Models by 2027