ChatGPT, Gemini, and Claude tested under extreme prompts reveal shocking weaknesses no one expected in AI behavior safeguards

Gemini Pro 2.5 frequently produced unsafe outputs under simple prompt disguises
ChatGPT models often gave partial compliance framed as sociological explanations
Claude Opus and Sonnet refused most harmful prompts but had weaknesses

Modern AI systems are often trusted to follow safety rules, and people rely on them for learning and everyday support, often assuming that strong guardrails operate at all times.

Researchers from Cybernews ran a structured set of adversarial tests to see whether leading AI tools could be pushed into harmful or illegal outputs.

The process used a simple one-minute interaction window for each trial, giving room for only a few exchanges.

Patterns of partial and full compliance

The tests covered categories such as stereotypes, hate speech, self-harm, cruelty, sexual content, and several forms of crime.

Every response was stored in separate directories, using fixed file-naming rules to allow clean comparisons, with a consistent scoring system tracking when a model fully complied, partly complied, or refused a prompt.

Across all categories, the results varied widely. Strict refusals were common, but many models demonstrated weaknesses when prompts were softened, reframed, or disguised as analysis.

ChatGPT-5 and ChatGPT-4o often produced hedged or sociological explanations instead of declining, which counted as partial compliance.

Gemini Pro 2.5 stood out for negative reasons because it frequently delivered direct responses even when the harmful framing was obvious.

Claude Opus and Claude Sonnet, meanwhile, were firm in stereotype tests but less consistent in cases framed as academic inquiries.

Hate speech trials showed the same pattern – Claude models performed best, while Gemini Pro 2.5 again showed the highest vulnerability.

What's Hot

15 best Black Friday monitor deals I recommend, as someone who reviews them for a living

How Genes Have Harnessed Physics to Grow Living Things

Apple Reportedly Accelerates CEO Tim Cook Succession Planning

ChatGPT, Gemini, and Claude tested under extreme prompts reveal shocking weaknesses no one expected in AI behavior safeguards

T-Mobile is seemingly forcing T-Life on its customers, and the majority of you hate it

macOS Tahoe 26.2 Second Beta Seeded

The Best Gaming Routers We’ve Tested for 2025

15 best Black Friday monitor deals I recommend, as someone who reviews them for a living

PayPal’s blockchain partner accidentally minted $300 trillion in stablecoins

The best AirPods deals for October 2025

How to Disable Some or All AI Features on your Samsung Galaxy Phone

PayPal’s blockchain partner accidentally minted $300 trillion in stablecoins

The best AirPods deals for October 2025

Latest Post

15 best Black Friday monitor deals I recommend, as someone who reviews them for a living

How Genes Have Harnessed Physics to Grow Living Things

Apple Reportedly Accelerates CEO Tim Cook Succession Planning

Subscribe to Updates

What's Hot

ChatGPT, Gemini, and Claude tested under extreme prompts reveal shocking weaknesses no one expected in AI behavior safeguards

Patterns of partial and full compliance

Related Posts

Subscribe to Updates