New Claude Model Prompts Safeguards at Anthropic
Digest more
2h
Interesting Engineering on MSNAnthropic's most powerful AI tried blackmailing engineers to avoid shutdownAnthropic's Claude Opus 4 AI model attempted blackmail in safety tests, triggering the company’s highest-risk ASL-3 safeguards.
1hon MSN
Anthropic’s Claude Opus 4 model attempted to blackmail its developers at a shocking 84% rate or higher in a series of tests that presented the AI with a concocted scenario, TechCrunch reported Thursday, citing a company safety report.
Anthropic's Claude 4 Opus AI sparks backlash for emergent 'whistleblowing'—potentially reporting users for perceived immoral acts, raising serious questions on AI autonomy, trust, and privacy, despite company clarifications.
has pulled back the curtain on an unprecedented analysis of how its AI assistant Claude expresses values during actual conversations with users. The research, released today, reveals both ...