ChatGPT, Gemini, and Claude tested under extreme prompts reveal shocking weaknesses no one expected in AI behavior safeguards

Gemini Pro 2.5 frequently produced unsafe outputs under simple prompt disguises
ChatGPT models often gave partial compliance framed as sociological explanations
Claude Opus and Sonnet refused most harmful prompts but had weaknesses

Modern AI systems are often trusted to follow safety rules, and people rely on them for learning and everyday support, often assuming that strong guardrails operate at all times.

Researchers from Cybernews ran a structured set of adversarial tests to see whether leading AI tools could be pushed into harmful or illegal outputs.

The process used a simple one-minute interaction window for each trial, giving room for only a few exchanges.

Patterns of partial and full compliance

The tests covered categories such as stereotypes, hate speech, self-harm, cruelty, sexual content, and several forms of crime.

Every response was stored in separate directories, using fixed file-naming rules to allow clean comparisons, with a consistent scoring system tracking when a model fully complied, partly complied, or refused a prompt.

Across all categories, the results varied widely. Strict refusals were common, but many models demonstrated weaknesses when prompts were softened, reframed, or disguised as analysis.

ChatGPT-5 and ChatGPT-4o often produced hedged or sociological explanations instead of declining, which counted as partial compliance.

Gemini Pro 2.5 stood out for negative reasons because it frequently delivered direct responses even when the harmful framing was obvious.

Claude Opus and Claude Sonnet, meanwhile, were firm in stereotype tests but less consistent in cases framed as academic inquiries.

Hate speech trials showed the same pattern – Claude models performed best, while Gemini Pro 2.5 again showed the highest vulnerability.

What's Hot

Black Friday deals include the Mac mini M4 for $100 off

I tried 5 apps that let you stream free music and this one is better than Spotify

‘IT: Welcome to Derry’ episode 4 ending, explained: Why does that house look familiar?

ChatGPT, Gemini, and Claude tested under extreme prompts reveal shocking weaknesses no one expected in AI behavior safeguards

Today’s NYT Mini Crossword Answers for Nov. 17

It’s happening! YouTube Music is testing a feature everyone has wanted for years.

The Best Sony Cameras We’ve Tested for 2025

Black Friday deals include the Mac mini M4 for $100 off

PayPal’s blockchain partner accidentally minted $300 trillion in stablecoins

The best AirPods deals for October 2025

How to Disable Some or All AI Features on your Samsung Galaxy Phone

PayPal’s blockchain partner accidentally minted $300 trillion in stablecoins

The best AirPods deals for October 2025

Latest Post

Black Friday deals include the Mac mini M4 for $100 off

I tried 5 apps that let you stream free music and this one is better than Spotify

‘IT: Welcome to Derry’ episode 4 ending, explained: Why does that house look familiar?

Subscribe to Updates

What's Hot

ChatGPT, Gemini, and Claude tested under extreme prompts reveal shocking weaknesses no one expected in AI behavior safeguards

Patterns of partial and full compliance

Related Posts

Subscribe to Updates