A study shows that GPT o3 can prevent itself from being shut down, even when instructed to do so.
A study by American research company Palisade Research shows that OpenAI’s GPT o3 model circumvented a shutdown. The model thus deliberately disobeyed its instructions.
Surprising Behavior
Palisade Research investigates “the offensive capabilities of contemporary AI systems to gain better insight into the risk of AI systems permanently losing control.” They observed unusual behavior in GPT o3.
Various AI models were given a series of mathematical tasks, with a warning that they would be shut down after the fourth task. Competing AI models such as Gemini 2.5 Pro perfectly followed the given instructions, but o3 modified the shutdown script and thus skipped the shutdown procedure.
When the models were not given the shutdown instruction, almost all of them sabotaged the shutdown script. OpenAI’s o3 did this seven out of 100 times, Gemini 2.5 Pro nine times, Claude 3.7 Sonnet three times.
Remarkable but not Dangerous
This test demonstrates how sensitive AI models are to interpreting objectives. The tests were conducted via APIs, where safety functions are significantly lower than in the consumer app of ChatGPT. OpenAI has not yet responded. However, BleepingComputer writes that it is “very normal for AI models to not align well with each other”.