#217 – Beth Barnes on the most important graph in AI right now — and the 7-month rule that governs its progress

AI models today have a 50% chance of successfully completing a task that would take an expert human one hour. Seven months ago, that number was roughly 30 minutes — and seven months before that, 15 minutes. (See graph.)

These are substantial, multi-step tasks requiring sustained focus: building web applications, conducting machine learning research, or solving complex programming challenges.

Today’s guest, Beth Barnes, is CEO of METR (Model Evaluation & Threat Research) — the leading organisation measuring these capabilities.

Links to learn more, video, highlights, and full transcript: https://80k.info/bb

Beth's team has been timing how long it takes skilled humans to complete projects of varying length, then seeing how AI models perform on the same work. The resulting paper “Measuring AI ability to complete long tasks” made waves by revealing that the planning horizon of AI models was doubling roughly every seven months. It's regarded by many as the most useful AI forecasting work in years.

Beth has found models can already do “meaningful work” improving themselves, and she wouldn’t be surprised if AI models were able to autonomously self-improve as little as two years from now — in fact, “It seems hard to rule out even shorter [timelines]. Is there 1% chance of this happening in six, nine months? Yeah, that seems pretty plausible.”

Beth adds:

The sense I really want to dispel is, “But the experts must be on top of this. The experts would be telling us if it really was time to freak out.” The experts are not on top of this. Inasmuch as there are experts, they are saying that this is a concerning risk. … And to the extent that I am an expert, I am an expert telling you you should freak out.

What did you think of this episode? https://forms.gle/sFuDkoznxBcHPVmX6

Chapters:

Cold open (00:00:00)
Who is Beth Barnes? (00:01:19)
Can we see AI scheming in the chain of thought? (00:01:52)
The chain of thought is essential for safety checking (00:08:58)
Alignment faking in large language models (00:12:24)
We have to test model honesty even before they're used inside AI companies (00:16:48)
We have to test models when unruly and unconstrained (00:25:57)
Each 7 months models can do tasks twice as long (00:30:40)
METR's research finds AIs are solid at AI research already (00:49:33)
AI may turn out to be strong at novel and creative research (00:55:53)
When can we expect an algorithmic 'intelligence explosion'? (00:59:11)
Recursively self-improving AI might even be here in two years — which is alarming (01:05:02)
Could evaluations backfire by increasing AI hype and racing? (01:11:36)
Governments first ignore new risks, but can overreact once they arrive (01:26:38)
Do we need external auditors doing AI safety tests, not just the companies themselves? (01:35:10)
A case against safety-focused people working at frontier AI companies (01:48:44)
The new, more dire situation has forced changes to METR's strategy (02:02:29)
AI companies are being locally reasonable, but globally reckless (02:10:31)
Overrated: Interpretability research (02:15:11)
Underrated: Developing more narrow AIs (02:17:01)
Underrated: Helping humans judge confusing model outputs (02:23:36)
Overrated: Major AI companies' contributions to safety research (02:25:52)
Could we have a science of translating AI models' nonhuman language or neuralese? (02:29:24)
Could we ban using AI to enhance AI, or is that just naive? (02:31:47)
Open-weighting models is often good, and Beth has changed her attitude to it (02:37:52)
What we can learn about AGI from the nuclear arms race (02:42:25)
Infosec is so bad that no models are truly closed-weight models (02:57:24)
AI is more like bioweapons because it undermines the leading power (03:02:02)
What METR can do best that others can't (03:12:09)
What METR isn't doing that other people have to step up and do (03:27:07)
What research METR plans to do next (03:32:09)

This episode was originally recorded on February 17, 2025.

Video editing: Luke Monsour and Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Music: Ben Cordell
Transcriptions and web: Katy Moore

Kokeile Premiumia

Nauti 14 päivää ilmaiseksi

Tilaa Premium

Jaksot(293)

#169 – Paul Niehaus on whether cash transfers cause economic growth, and keeping theft to acceptable levels

"One of our earliest supporters and a dear friend of mine, Mark Lampert, once said to me, “The way I think about it is, imagine that this money were already in the hands of people living in poverty. If I could, would I want to tax it and then use it to finance other projects that I think would benefit them?” I think that's an interesting thought experiment -- and a good one -- to say, “Are there cases in which I think that's justifiable?” — Paul NiehausIn today’s episode, host Luisa Rodriguez interviews Paul Niehaus — co-founder of GiveDirectly — on the case for giving unconditional cash to the world's poorest households.Links to learn more, summary and full transcript.They cover:The empirical evidence on whether giving cash directly can drive meaningful economic growthHow the impacts of GiveDirectly compare to USAID employment programmesGiveDirectly vs GiveWell’s top-recommended charitiesHow long-term guaranteed income affects people's risk-taking and investmentsWhether recipients prefer getting lump sums or monthly instalmentsHow GiveDirectly tackles cases of fraud and theftThe case for universal basic income, and GiveDirectly’s UBI studies in Kenya, Malawi, and LiberiaThe political viability of UBIPlenty moreChapters:Cold open (00:00:00)Luisa’s intro (00:00:58)The basic case for giving cash directly to the poor (00:03:28)Comparing GiveDirectly to USAID programmes (00:15:42)GiveDirectly vs GiveWell’s top-recommended charities (00:35:16)Cash might be able to drive economic growth (00:41:59)Fraud and theft of GiveDirectly funds (01:09:48)Universal basic income studies (01:22:33)Skyjo (01:44:43)Producer and editor: Keiran HarrisAudio Engineering Lead: Ben CordellTechnical editing: Dominic Armstrong and Milo McGuireAdditional content editing: Luisa Rodriguez and Katy MooreTranscriptions: Katy Moore

26 Loka 20231h 47min

#168 – Ian Morris on whether deep history says we're heading for an intelligence explosion

"If we carry on looking at these industrialised economies, not thinking about what it is they're actually doing and what the potential of this is, you can make an argument that, yes, rates of growth are slowing, the rate of innovation is slowing. But it isn't. What we're doing is creating wildly new technologies: basically producing what is nothing less than an evolutionary change in what it means to be a human being. But this has not yet spilled over into the kind of growth that we have accustomed ourselves to in the fossil-fuel industrial era. That is about to hit us in a big way." — Ian MorrisIn today’s episode, host Rob Wiblin speaks with repeat guest Ian Morris about what big-picture history says about the likely impact of machine intelligence. Links to learn more, summary and full transcript.They cover:Some crazy anomalies in the historical record of civilisational progressWhether we should think about technology from an evolutionary perspectiveWhether we ought to expect war to make a resurgence or continue dying outWhy we can't end up living like The JetsonsWhether stagnation or cyclical recurring futures seem very plausibleWhat it means that the rate of increase in the economy has been increasingWhether violence is likely between humans and powerful AI systemsThe most likely reasons for Rob and Ian to be really wrong about all of thisHow professional historians react to this sort of talkThe future of Ian’s workPlenty moreChapters:Cold open (00:00:00)Rob’s intro (00:01:27)Why we should expect the future to be wild (00:04:08)How historians have reacted to the idea of radically different futures (00:21:20)Why we won’t end up in The Jetsons (00:26:20)The rise of machine intelligence (00:31:28)AI from an evolutionary point of view (00:46:32)Is violence likely between humans and powerful AI systems? (00:59:53)Most troubling objections to this approach in Ian’s view (01:28:20)Confronting anomalies in the historical record (01:33:10)The cyclical view of history (01:56:11)Is stagnation plausible? (02:01:38)The limit on how long this growth trend can continue (02:20:57)The future of Ian’s work (02:37:17)Producer and editor: Keiran HarrisAudio Engineering Lead: Ben CordellTechnical editing: Milo McGuireTranscriptions: Katy Moore

23 Loka 20232h 43min

#167 – Seren Kell on the research gaps holding back alternative proteins from mass adoption

"There have been literally thousands of years of breeding and living with animals to optimise these kinds of problems. But because we're just so early on with alternative proteins and there's so much white space, it's actually just really exciting to know that we can keep on innovating and being far more efficient than this existing technology — which, fundamentally, is just quite inefficient. You're feeding animals a bunch of food to then extract a small fraction of their biomass to then eat that.Animal agriculture takes up 83% of farmland, but produces just 18% of food calories. So the current system just is so wasteful. And the limiting factor is that you're just growing a bunch of food to then feed a third of the world's crops directly to animals, where the vast majority of those calories going in are lost to animals existing." — Seren KellLinks to learn more, summary and full transcript.In today’s episode, host Luisa Rodriguez interviews Seren Kell — Senior Science and Technology Manager at the Good Food Institute Europe — about making alternative proteins as tasty, cheap, and convenient as traditional meat, dairy, and egg products.They cover:The basic case for alternative proteins, and why they’re so hard to makeWhy fermentation is a surprisingly promising technology for creating delicious alternative proteins The main scientific challenges that need to be solved to make fermentation even more usefulThe progress that’s been made on the cultivated meat front, and what it will take to make cultivated meat affordableHow GFI Europe is helping with some of these challengesHow people can use their careers to contribute to replacing factory farming with alternative proteinsThe best part of Seren’s jobPlenty moreChapters:Cold open (00:00:00)Luisa’s intro (00:01:08)The interview begins (00:02:22)Why alternative proteins? (00:02:36)What makes alternative proteins so hard to make? (00:11:30)Why fermentation is so exciting (00:24:23)The technical challenges involved in scaling fermentation (00:44:38)Progress in cultivated meat (01:06:04)GFI Europe’s work (01:32:47)Careers (01:45:10)The best part of Seren’s job (01:50:07)Producer and editor: Keiran HarrisAudio Engineering Lead: Ben CordellTechnical editing: Dominic Armstrong and Milo McGuireAdditional content editing: Luisa Rodriguez and Katy MooreTranscriptions: Katy Moore

18 Loka 20231h 54min

#166 – Tantum Collins on what he’s learned as an AI policy insider at the White House, DeepMind and elsewhere

"If you and I and 100 other people were on the first ship that was going to go settle Mars, and were going to build a human civilisation, and we have to decide what that government looks like, and we have all of the technology available today, how do we think about choosing a subset of that design space? That space is huge and it includes absolutely awful things, and mixed-bag things, and maybe some things that almost everyone would agree are really wonderful, or at least an improvement on the way that things work today. But that raises all kinds of tricky questions. My concern is that if we don't approach the evolution of collective decision making and government in a deliberate way, we may end up inadvertently backing ourselves into a corner, where we have ended up on some slippery slope -- and all of a sudden we have, let's say, autocracies on the global stage are strengthened relative to democracies." — Tantum CollinsIn today’s episode, host Rob Wiblin gets the rare chance to interview someone with insider AI policy experience at the White House and DeepMind who’s willing to speak openly — Tantum Collins.Links to learn more, highlights, and full transcript.They cover:How AI could strengthen government capacity, and how that's a double-edged swordHow new technologies force us to confront tradeoffs in political philosophy that we were previously able to pretend weren't thereTo what extent policymakers take different threats from AI seriouslyWhether the US and China are in an AI arms race or notWhether it's OK to transform the world without much of the world agreeing to itThe tyranny of small differences in AI policyDisagreements between different schools of thought in AI policy, and proposals that could unite themHow the US AI Bill of Rights could be improvedWhether AI will transform the labour market, and whether it will become a partisan political issueThe tensions between the cultures of San Francisco and DC, and how to bridge the divide between themWhat listeners might be able to do to help with this whole messPanpsychismPlenty moreChapters:Cold open (00:00:00)Rob's intro (00:01:00)The interview begins (00:04:01)The risk of autocratic lock-in due to AI (00:10:02)The state of play in AI policymaking (00:13:40)China and AI (00:32:12)The most promising regulatory approaches (00:57:51)Transforming the world without the world agreeing (01:04:44)AI Bill of Rights (01:17:32)Who’s ultimately responsible for the consequences of AI? (01:20:39)Policy ideas that could appeal to many different groups (01:29:08)Tension between those focused on x-risk and those focused on AI ethics (01:38:56)Communicating with policymakers (01:54:22)Is AI going to transform the labour market in the next few years? (01:58:51)Is AI policy going to become a partisan political issue? (02:08:10)The value of political philosophy (02:10:53)Tantum’s work at DeepMind (02:21:20)CSET (02:32:48)Career advice (02:35:21)Panpsychism (02:55:24)Producer and editor: Keiran HarrisAudio Engineering Lead: Ben CordellTechnical editing: Simon Monsour and Milo McGuireTranscriptions: Katy Moore

12 Loka 20233h 8min

#165 – Anders Sandberg on war in space, whether civilisations age, and the best things possible in our universe

"Now, the really interesting question is: How much is there an attacker-versus-defender advantage in this kind of advanced future? Right now, if somebody's sitting on Mars and you're going to war against them, it's very hard to hit them. You don't have a weapon that can hit them very well. But in theory, if you fire a missile, after a few months, it's going to arrive and maybe hit them, but they have a few months to move away. Distance actually makes you safer: if you spread out in space, it's actually very hard to hit you. So it seems like you get a defence-dominant situation if you spread out sufficiently far. But if you're in Earth orbit, everything is close, and the lasers and missiles and the debris are a terrible danger, and everything is moving very fast. So my general conclusion has been that war looks unlikely on some size scales but not on others." — Anders SandbergIn today’s episode, host Rob Wiblin speaks with repeat guest and audience favourite Anders Sandberg about the most impressive things that could be achieved in our universe given the laws of physics.Links to learn more, summary and full transcript.They cover:The epic new book Anders is working on, and whether he’ll ever finish itWhether there's a best possible world or we can just keep improving foreverWhat wars might look like if the galaxy is mostly settledThe impediments to AI or humans making it to other starsHow the universe will end a million trillion years in the futureWhether it’s useful to wonder about whether we’re living in a simulationThe grabby aliens theoryWhether civilizations get more likely to fail the older they getThe best way to generate energy that could ever existBlack hole bombsWhether superintelligence is necessary to get a lot of valueThe likelihood that life from elsewhere has already visited EarthAnd plenty more.Producer and editor: Keiran HarrisAudio Engineering Lead: Ben CordellTechnical editing: Simon Monsour and Milo McGuireTranscriptions: Katy Moore

6 Loka 20232h 48min

#164 – Kevin Esvelt on cults that want to kill everyone, stealth vs wildfire pandemics, and how he felt inventing gene drives

"Imagine a fast-spreading respiratory HIV. It sweeps around the world. Almost nobody has symptoms. Nobody notices until years later, when the first people who are infected begin to succumb. They might die, something else debilitating might happen to them, but by that point, just about everyone on the planet would have been infected already. And then it would be a race. Can we come up with some way of defusing the thing? Can we come up with the equivalent of HIV antiretrovirals before it's too late?" — Kevin EsveltIn today’s episode, host Luisa Rodriguez interviews Kevin Esvelt — a biologist at the MIT Media Lab and the inventor of CRISPR-based gene drive — about the threat posed by engineered bioweapons.Links to learn more, summary and full transcript.They cover:Why it makes sense to focus on deliberately released pandemicsCase studies of people who actually wanted to kill billions of humansHow many people have the technical ability to produce dangerous virusesThe different threats of stealth and wildfire pandemics that could crash civilisationThe potential for AI models to increase access to dangerous pathogensWhy scientists try to identify new pandemic-capable pathogens, and the case against that researchTechnological solutions, including UV lights and advanced PPEUsing CRISPR-based gene drive to fight diseases and reduce animal sufferingAnd plenty more.Producer and editor: Keiran HarrisAudio Engineering Lead: Ben CordellTechnical editing: Simon MonsourAdditional content editing: Katy Moore and Luisa RodriguezTranscriptions: Katy Moore

2 Loka 20233h 3min

Great power conflict (Article)

Today’s release is a reading of our Great power conflict problem profile, written and narrated by Stephen Clare.If you want to check out the links, footnotes and figures in today’s article, you can find those here.And if you like this article, you might enjoy a couple of related episodes of this podcast:#128 – Chris Blattman on the five reasons wars happen#140 – Bear Braumoeller on the case that war isn’t in declineAudio mastering and editing for this episode: Dominic ArmstrongAudio Engineering Lead: Ben CordellProducer: Keiran Harris

22 Syys 20231h 19min

#163 – Toby Ord on the perils of maximising the good that you do

Effective altruism is associated with the slogan "do the most good." On one level, this has to be unobjectionable: What could be bad about helping people more and more?But in today's interview, Toby Ord — moral philosopher at the University of Oxford and one of the founding figures of effective altruism — lays out three reasons to be cautious about the idea of maximising the good that you do. He suggests that rather than “doing the most good that we can,” perhaps we should be happy with a more modest and manageable goal: “doing most of the good that we can.”Links to learn more, summary and full transcript.Toby was inspired to revisit these ideas by the possibility that Sam Bankman-Fried, who stands accused of committing severe fraud as CEO of the cryptocurrency exchange FTX, was motivated to break the law by a desire to give away as much money as possible to worthy causes.Toby's top reason not to fully maximise is the following: if the goal you're aiming at is subtly wrong or incomplete, then going all the way towards maximising it will usually cause you to start doing some very harmful things.This result can be shown mathematically, but can also be made intuitive, and may explain why we feel instinctively wary of going “all-in” on any idea, or goal, or way of living — even something as benign as helping other people as much as possible.Toby gives the example of someone pursuing a career as a professional swimmer. Initially, as our swimmer takes their training and performance more seriously, they adjust their diet, hire a better trainer, and pay more attention to their technique. While swimming is the main focus of their life, they feel fit and healthy and also enjoy other aspects of their life as well — family, friends, and personal projects.But if they decide to increase their commitment further and really go all-in on their swimming career, holding back nothing back, then this picture can radically change. Their effort was already substantial, so how can they shave those final few seconds off their racing time? The only remaining options are those which were so costly they were loath to consider them before.To eke out those final gains — and go from 80% effort to 100% — our swimmer must sacrifice other hobbies, deprioritise their relationships, neglect their career, ignore food preferences, accept a higher risk of injury, and maybe even consider using steroids.Now, if maximising one's speed at swimming really were the only goal they ought to be pursuing, there'd be no problem with this. But if it's the wrong goal, or only one of many things they should be aiming for, then the outcome is disastrous. In going from 80% to 100% effort, their swimming speed was only increased by a tiny amount, while everything else they were accomplishing dropped off a cliff.The bottom line is simple: a dash of moderation makes you much more robust to uncertainty and error.As Toby notes, this is similar to the observation that a sufficiently capable superintelligent AI, given any one goal, would ruin the world if it maximised it to the exclusion of everything else. And it follows a similar pattern to performance falling off a cliff when a statistical model is 'overfit' to its data.In the full interview, Toby also explains the “moral trade” argument against pursuing narrow goals at the expense of everything else, and how consequentialism changes if you judge not just outcomes or acts, but everything according to its impacts on the world.Toby and Rob also discuss:The rise and fall of FTX and some of its impactsWhat Toby hoped effective altruism would and wouldn't become when he helped to get it off the groundWhat utilitarianism has going for it, and what's wrong with it in Toby's viewHow to mathematically model the importance of personal integrityWhich AI labs Toby thinks have been acting more responsibly than othersHow having a young child affects Toby’s feelings about AI riskWhether infinities present a fundamental problem for any theory of ethics that aspire to be fully impartialHow Toby ended up being the source of the highest quality images of the Earth from spaceGet this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript.Producer and editor: Keiran HarrisAudio Engineering Lead: Ben CordellTechnical editing: Simon MonsourTranscriptions: Katy Moore

8 Syys 20233h 7min

Kaikki yhdessä sovelluksessa

Kuuntele kaikki suosikkipodcastisi ja -äänikirjasi yhdessä paikassa.

Sinulle valikoitua sisältöä

Podme-sovelluksessa kokoat suosikkisi helposti omaan kirjastoosi. Saat meiltä myös kuuntelusuosituksia!

Jatka kuuntelua koska tahansa

Voit jatkaa siitä mihin jäit, myös offline-tilassa.

Premium

9,99 €/kk

Kaikki premium-podcastit
Ei mainoksia
Ei sitoutumista, peruuta koska tahansa

Aloita 14 päivän kokeilu

Premium

13,99 €/kk

Kaikki premium-podcastit
Ei mainoksia
Ei sitoutumista, peruuta koska tahansa
Yksi lisäkäyttäjä

Kokeile 14 päivää maksutta

Suosittua kategoriassa Koulutus

rss-lasnaolon-hetkia-mindfulness-tutuksi

ihminen-tavattavissa-tommy-hellsten-instituutti

Tarinat ja äänet, joita rakastat kuunnella

Kuuntele kaikki suosikkipodcastisi ja -äänikirjasi

Lue lisää