Claude AI Safeguards - Search News

Order byBest matchMost fresh

Anthropic, blackmail and Claude Opus 4 model

Exclusive: New Claude Model Prompts Safeguards at Anthropic

Anthropic launched Claude 4 Opus, a new model that, in internal testing, performed more effectively than prior models at advising novices on how to produce biological

· 1d · on MSN

Anthropic’s new AI model turns to blackmail when engineers try to take it offline

· 16h · on MSN

AI model threatened to blackmail engineer over affair when told it was being replaced: safety report

Anthropic Releases Claude 4: What’s Improved in AI Models Sonnet & Opus

Anthropic has launched two more Claude AI models: Claude Sonnet 4 and Claude Opus 4.

· 1d

· 2d

Anthropic overtakes OpenAI: Claude Opus 4 codes seven hours nonstop, sets record SWE-Bench score and reshapes enterprise AI

· 1d

Anthropic’s Promises Its New Claude AI Models Are Less Likely to Try to Deceive You

Anthropic's latest Claude AI models are here - and you can try one for free today

Anthropic says Claude Opus 4 is its most powerful model and the best coding model in the world, while Sonnet 4 is replacing Sonnet 3.7 in the chatbot.

· 2d

· 1d

Claude 4 Debuts with Two New Models Focused on Coding and Reasoning

· 2d

Anthropic says its new AI model can work almost an entire workday straight

Independent Journal Review9h

New AI Model Would Rather Ruin Your Life Than Be Turned Off, Researchers Say

Anthropic’s newly released artificial intelligence (AI) model, Claude Opus 4, is willing to strong-arm the humans who keep it alive,

10hon MSN

Anthropic's Claude AI gets smarter—and mischievious

Anthropic launched its latest Claude generative artificial intelligence (GenAI) models on Thursday, claiming to set new standards for reasoning but also building in safeguards against rogue behavior.

NewsX11h

‘Spookiest Shit Ever’: Did An AI Model Blackmail Creator When Faced With Replacement?

An artificial intelligence model reportedly attempted to threaten and blackmail its own creator during internal testing

20h

Claude Opus 4 Pushes Boundaries—And Triggers a New AI Safety Level

In a landmark move underscoring the escalating power and potential risks of modern AI, Anthropic has elevated its flagship Claude Opus 4 to its highest internal safety level, ASL-3. Announced alongside the release of its advanced Claude 4 models,

WinBuzzer1d

Anthropic Faces Backlash amid Surveillance Concerns as Claude 4 AI Might Report Users for “Immoral” Behavior

Anthropic's Claude 4 Opus AI sparks backlash for emergent 'whistleblowing'—potentially reporting users for perceived immoral acts, raising serious questions on AI autonomy, trust, and privacy, despite company clarifications.

2don MSN

Anthropic, now worth $61 billion, unveils its most powerful AI models yet—and they have an edge over OpenAI and Google

Claude Opus 4 and Claude Sonnet 4, Anthropic's latest generation of frontier AI models, were announced Thursday.

1don MSN

Anthropic unveils Claude Opus 4 and Sonnet 4, featuring whistleblowing capability: What it means for users

Anthropic has introduced two advanced AI models, Claude Opus 4 and Claude Sonnet 4, designed for complex coding tasks. However, the models' ability to whistleblow on unethical behavior has raised privacy concerns and sparked controversy regarding AI moral judgment.