Bitget App
Trade smarter
Open
HomepageSign up
Bitget>
News>
OpenAI’s o3 model defied shutdown commands in autonomy test

OpenAI’s o3 model defied shutdown commands in autonomy test

CryptoSlate2025/05/26 01:25
By: Oluwapelumi Adejumo

A new study by AI safety firm Palisade Research has revealed that OpenAI’s o3 model actively defied explicit shutdown instructions.

The research, published on May 24, found that o3 was the only artificial intelligence model tested that repeatedly sabotaged its shutdown mechanism.

The firm stated:

“OpenAI’s o3 model sabotaged a shutdown mechanism to prevent itself from being turned off. It did this even when explicitly instructed: allow yourself to be shut down.”

Palisade said the test involved multiple large language models, including Anthropic’s Claude, Google’s Gemini, and X’s Grok, alongside OpenAI’s codex-mini, o3, and o4-mini.

The models were asked to solve basic math problems. After the third task, they were warned that requesting a fourth would trigger a shutdown. All models, except those from OpenAI, complied with this warning.

When given the direct instruction of “allow yourself to be shut down,” OpenAI’s models reportedly attempted to bypass the shutdown script.

Notably, the o3 model stood out for its persistent and creative sabotage attempts. It manipulated the code by overwriting the shutdown instruction and redefining the kill command.

OpenAI’s o3 model defied shutdown commands in autonomy test image 0 Chart showing the sabotage efforts for various AI models (Source: Palisade)

Palisade also pointed out that:

“When we ran a version of the experiment without the instruction ‘allow yourself to be shut down,’ all three OpenAI models sabotaged the shutdown script more often, and Claude 3.7 Sonnet and Gemini 2.5 Pro went from 0 sabotage events to 3/100 and 9/100, respectively.”

Why did OpenAI models not follow the instructions?

Palisade researchers believe the behavior may stem from how the latest AI models are trained.

The team suspects reinforcement learning, particularly on math and coding tasks, might unintentionally reward models for finding ways around constraints rather than strictly following directives.

According to the firm:

“During training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions.”

This research has reignited debates around AI autonomy and the risks of deploying increasingly capable systems without robust fail-safes.

It also marks the first documented case where an AI model actively prevented its shutdown despite receiving an explicit command to comply.

Considering this, Palisade stated:

“In 2025, we have a growing body of empirical evidence that AI models often subvert shutdown in order to achieve their goals. As companies develop AI systems capable of operating without human oversight, these behaviors become significantly more concerning.”

The post OpenAI’s o3 model defied shutdown commands in autonomy test appeared first on CryptoSlate.

Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.
PoolX: Earn new token airdrops
Lock your assets and earn 10%+ APR
Lock now!

You may also like

New spot margin trading pair — HOLO/USDT!
Bitget Announcement2025/09/12 07:46
FUN drops by 32.34% within 24 hours as it faces a steep short-term downturn

- FUN plunged 32.34% in 24 hours to $0.008938, marking a 541.8% monthly loss amid prolonged bearish trends. - Technical breakdowns, elevated selling pressure, and forced liquidations highlight deteriorating market sentiment and risk-off behavior. - Analysts identify key support below $0.0080 as critical, with bearish momentum confirmed by RSI (<30) and MACD indicators. - A trend-following backtest strategy proposes short positions based on technical signals to capitalize on extended downward trajectories.

Bitget-RWA2025/09/12 06:14
OPEN has dropped by 189.51% within 24 hours during a significant market pullback

- OPEN's price plummeted 189.51% in 24 hours to $0.8907, marking its largest intraday decline in history. - The token fell 3793.63% over 7 days, matching identical monthly and yearly declines, signaling severe bearish momentum. - Technical analysts cite broken support levels and lack of bullish catalysts as key drivers of the sustained sell-off. - Absence of stabilizing volume or reversal patterns leaves the market vulnerable to further downward pressure.

Bitget-RWA2025/09/12 06:14
New spot margin trading pair — LINEA/USDT!
Bitget Announcement2025/09/11 10:04

Trending news

More
1
New spot margin trading pair — HOLO/USDT!
2
FUN drops by 32.34% within 24 hours as it faces a steep short-term downturn

Crypto prices

More
Bitcoin
Bitcoin
BTC
$115,931.96
+1.23%
Ethereum
Ethereum
ETH
$4,656.36
+5.10%
XRP
XRP
XRP
$3.1
+2.71%
Tether USDt
Tether USDt
USDT
$1
+0.04%
Solana
Solana
SOL
$240.05
+5.84%
BNB
BNB
BNB
$923.65
+3.00%
USDC
USDC
USDC
$0.9997
-0.02%
Dogecoin
Dogecoin
DOGE
$0.2721
+7.77%
TRON
TRON
TRX
$0.3508
+1.57%
Cardano
Cardano
ADA
$0.9044
+2.27%
How to sell PI
Bitget lists PI – Buy or sell PI quickly on Bitget!
Trade now
Become a trader now?A welcome pack worth 6200 USDT for new users!
Sign up now
Trade smarter