Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

A red-teamer was embedded inside Anthropic for three weeks, told to imagine he was an evil Claude, and asked to figure out how to launch a ‘rogue AI deployment’ without getting caught. It’s one part of a landmark report released yesterday by METR — the outfit behind the task-completion time horizon graph which has become the single most watched measure of AI progress.


This major new research push is being conducted with close collaboration from OpenAI, Google DeepMind, Meta, and Anthropic, and led by METR researchers Hjalmar Wijk and Ajeya Cotra. It represents the first systematic study of what newly trained AI models could get away with inside the companies that built them, before anyone outside the company even knows they exist.

The conclusion: AI models now have the means, the motive, and the opportunity to start “minimal rogue deployments” in pursuit of their own independent goals, like acquiring more compute, at all four companies studied.

David Rein, the red-teamer placed inside Anthropic, identified a number of weaknesses models could exploit there: expansive permissions, cloud jobs outside of monitoring, and monitors that are trivial to jailbreak. But he also found that frontier models were comically bad at key parts of the process, which means they can’t cause meaningful damage for now.

In this video, Rob Wiblin reconciles the conflicting picture and looks forward to METR’s second round of stress tests. They’ll begin in just a few months, a necessary move with AI advancing so quickly.

This episode was recorded on May 15, 2026.

Learn more, video, and full transcript: https://80k.info/metr-report

Chapters:

  • What could an unreleased AI get away with? – the new METR report (00:00:00)
  • Motive: Why grab more compute? (00:01:54)
  • Opportunity: YOLO mode and jailbreaks (00:05:46)
  • Means: Brilliant idiots in data centres (00:11:02)
  • We have to test unreleased models (00:15:45)
  • Especially if AI R&D is coming in 2028 (00:18:30)

Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Josh Alward
Camera operator: Dominic Armstrong
Production: Elizabeth Cox, Nick Stockton, and Katy Moore

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(338)

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

Most people working on AI safety think without a massive effort AI systems will probably end up with goals catastrophically different from humanity’s. Today’s guest, Rohin Shah — head of AGI Safety an...

2 Kesä 2h 48min

What makes for a dream job? | Benjamin Todd

What makes for a dream job? | Benjamin Todd

What actually makes a job fulfilling? It's not what most career advice tells you. "Follow your passion" sounds inspiring, but it's misleading — and the research backs that up.Drawing on hundreds of st...

28 Touko 28min

We’re updating our career advice for the strangest time in history | Benjamin Todd, author of 80,000 Hours

We’re updating our career advice for the strangest time in history | Benjamin Todd, author of 80,000 Hours

The average career is 80,000 hours long. With AI advancing so rapidly, the hours you have left in your career matter more than ever.Some leading AI researchers think there’s a 10% chance that AI syste...

26 Touko 1h 6min

#243 – 'Godfather of AI' Yoshua Bengio: "I now see a path" to safe superintelligent AI

#243 – 'Godfather of AI' Yoshua Bengio: "I now see a path" to safe superintelligent AI

The co-inventor of modern AI and the most cited living scientist believes he's figured out how to ensure AI is honest, incapable of deception, and never goes rogue. Yoshua Bengio – Turing Award Winner...

7 Touko 2h 35min

'95% of AI Pilots Fail': The hidden agenda behind the viral stat that misled millions

'95% of AI Pilots Fail': The hidden agenda behind the viral stat that misled millions

You might have heard that '95% of corporate AI pilots' are failing. It was one of the most widely cited AI statistics of 2025, parroted by media outlets everywhere. It helped trigger a Nasdaq selloff ...

28 Huhti 10min

#242 – Will MacAskill on how we survive the 'intelligence explosion,' AI character, and the case for 'viatopia'

#242 – Will MacAskill on how we survive the 'intelligence explosion,' AI character, and the case for 'viatopia'

Hundreds of millions already turn to AI on the most personal of topics — therapy, political opinions, and how to treat others. And as AI takes over more of the economy, the character of these systems ...

22 Huhti 3h 14min

Risks from power-seeking AI systems (article narration by Zershaaneh Qureshi)

Risks from power-seeking AI systems (article narration by Zershaaneh Qureshi)

Hundreds of prominent AI scientists and other notable figures signed a statement in 2023 saying that mitigating the risk of extinction from AI should be a global priority. At 80,000 Hours, we’ve consi...

16 Huhti 1h 29min

Suosittua kategoriassa Koulutus

rss-murhan-anatomia
psykopodiaa-podcast
voi-hyvin-meditaatiot-2
rss-hereilla
rss-rahamania
kesken
psykologia
rss-narsisti
rss-valo-minussa-2
rss-koira-haudattuna
rss-arkea-ja-aurinkoa-podcast-espanjasta
rss-liian-kuuma-peruna
rss-opiskelemaan
taytta-tavaraa
aamupore
rahapuhetta
adhd-podi
rss-duodecim-lehti
rss-radplus
rss-suomen-aa-podcast