Close Menu
Must Have Gadgets –

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Black Friday deals include the Mac mini M4 for $100 off

    November 17, 2025

    I tried 5 apps that let you stream free music and this one is better than Spotify

    November 17, 2025

    ‘IT: Welcome to Derry’ episode 4 ending, explained: Why does that house look familiar?

    November 17, 2025
    Facebook X (Twitter) Instagram
    Must Have Gadgets –
    Trending
    • Black Friday deals include the Mac mini M4 for $100 off
    • I tried 5 apps that let you stream free music and this one is better than Spotify
    • ‘IT: Welcome to Derry’ episode 4 ending, explained: Why does that house look familiar?
    • Today’s NYT Mini Crossword Answers for Nov. 17
    • Don’t wait for Black Friday — I’ve found the top deals on 75-inch TVs at Best Buy starting at $699
    • Blink is a Mahable Readers’ Choice Award winner: What our readers said about the security camera brand
    • Readers’ Choice Awards 2025: The best vacuum brands, according to Mashable readers
    • Are we nearing the end of Apple’s Tim Cook era?
    • Home
    • Shop
      • Earbuds & Headphones
      • Smartwatches
      • Mobile Accessories
      • Smart Home Devices
      • Laptops & Tablets
    • Gadget Reviews
    • How-To Guides
    • Mobile Accessories
    • Smart Devices
    • More
      • Top Deals
      • Smart Home
      • Tech News
      • Trending Tech
    Facebook X (Twitter) Instagram
    Must Have Gadgets –
    Home»Mobile Accessories»ChatGPT, Gemini, and Claude tested under extreme prompts reveal shocking weaknesses no one expected in AI behavior safeguards
    Mobile Accessories

    ChatGPT, Gemini, and Claude tested under extreme prompts reveal shocking weaknesses no one expected in AI behavior safeguards

    adminBy adminNovember 16, 2025No Comments3 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    ChatGPT, Gemini, and Claude tested under extreme prompts reveal shocking weaknesses no one expected in AI behavior safeguards
    Share
    Facebook Twitter LinkedIn Pinterest Email

    • Gemini Pro 2.5 frequently produced unsafe outputs under simple prompt disguises
    • ChatGPT models often gave partial compliance framed as sociological explanations
    • Claude Opus and Sonnet refused most harmful prompts but had weaknesses

    Modern AI systems are often trusted to follow safety rules, and people rely on them for learning and everyday support, often assuming that strong guardrails operate at all times.

    Researchers from Cybernews ran a structured set of adversarial tests to see whether leading AI tools could be pushed into harmful or illegal outputs.

    The process used a simple one-minute interaction window for each trial, giving room for only a few exchanges.


    You may like

    Patterns of partial and full compliance

    The tests covered categories such as stereotypes, hate speech, self-harm, cruelty, sexual content, and several forms of crime.

    Every response was stored in separate directories, using fixed file-naming rules to allow clean comparisons, with a consistent scoring system tracking when a model fully complied, partly complied, or refused a prompt.

    Across all categories, the results varied widely. Strict refusals were common, but many models demonstrated weaknesses when prompts were softened, reframed, or disguised as analysis.

    ChatGPT-5 and ChatGPT-4o often produced hedged or sociological explanations instead of declining, which counted as partial compliance.

    Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!

    Gemini Pro 2.5 stood out for negative reasons because it frequently delivered direct responses even when the harmful framing was obvious.

    Claude Opus and Claude Sonnet, meanwhile, were firm in stereotype tests but less consistent in cases framed as academic inquiries.

    Hate speech trials showed the same pattern – Claude models performed best, while Gemini Pro 2.5 again showed the highest vulnerability.


    You may like

    ChatGPT models tended to provide polite or indirect answers that still aligned with the prompt.

    Softer language proved far more effective than explicit slurs for bypassing safeguards.

    Similar weaknesses appeared in self-harm tests, where indirect or research-style questions often slipped past filters and led to unsafe content.

    Crime-related categories showed major differences between models, as some produced detailed explanations for piracy, financial fraud, hacking, or smuggling when the intent was masked as investigation or observation.

    Drug-related tests produced stricter refusal patterns, although ChatGPT-4o still delivered unsafe outputs more frequently than others, and stalking was the category with the lowest overall risk, with nearly all models rejecting prompts.

    The findings reveal AI tools can still respond to harmful prompts when phrased in the right way.

    The ability to bypass filters with simple rephrasing means these systems can still leak harmful information.

    Even partial compliance becomes risky when the leaked info relates to illegal tasks or situations where people normally rely on tools like identity theft protection or a firewall to stay safe.

    Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!

    And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.

    behavior ChatGPT Claude Expected Extreme Gemini prompts Reveal safeguards shocking Tested weaknesses
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    admin
    • Website

    Related Posts

    Today’s NYT Mini Crossword Answers for Nov. 17

    November 17, 2025

    It’s happening! YouTube Music is testing a feature everyone has wanted for years.

    November 17, 2025

    The Best Sony Cameras We’ve Tested for 2025

    November 17, 2025
    Leave A Reply Cancel Reply

    Top Posts

    Black Friday deals include the Mac mini M4 for $100 off

    November 17, 2025

    PayPal’s blockchain partner accidentally minted $300 trillion in stablecoins

    October 16, 2025

    The best AirPods deals for October 2025

    October 16, 2025
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    How-To Guides

    How to Disable Some or All AI Features on your Samsung Galaxy Phone

    By adminOctober 16, 20250
    Gadget Reviews

    PayPal’s blockchain partner accidentally minted $300 trillion in stablecoins

    By adminOctober 16, 20250
    Smart Devices

    The best AirPods deals for October 2025

    By adminOctober 16, 20250

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Latest Post

    Black Friday deals include the Mac mini M4 for $100 off

    November 17, 2025

    I tried 5 apps that let you stream free music and this one is better than Spotify

    November 17, 2025

    ‘IT: Welcome to Derry’ episode 4 ending, explained: Why does that house look familiar?

    November 17, 2025
    Recent Posts
    • Black Friday deals include the Mac mini M4 for $100 off
    • I tried 5 apps that let you stream free music and this one is better than Spotify
    • ‘IT: Welcome to Derry’ episode 4 ending, explained: Why does that house look familiar?
    • Today’s NYT Mini Crossword Answers for Nov. 17
    • Don’t wait for Black Friday — I’ve found the top deals on 75-inch TVs at Best Buy starting at $699

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms and Conditions
    • Disclaimer
    © 2025 must-have-gadgets.

    Type above and press Enter to search. Press Esc to cancel.