#159 – Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or less

#159 – Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or less

In July, OpenAI announced a new team and project: Superalignment. The goal is to figure out how to make superintelligent AI systems aligned and safe to use within four years, and the lab is putting a massive 20% of its computational resources behind the effort.

Today's guest, Jan Leike, is Head of Alignment at OpenAI and will be co-leading the project. As OpenAI puts it, "...the vast power of superintelligence could be very dangerous, and lead to the disempowerment of humanity or even human extinction. ... Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue."

Links to learn more, summary and full transcript.

Given that OpenAI is in the business of developing superintelligent AI, it sees that as a scary problem that urgently has to be fixed. So it’s not just throwing compute at the problem -- it’s also hiring dozens of scientists and engineers to build out the Superalignment team.

Plenty of people are pessimistic that this can be done at all, let alone in four years. But Jan is guardedly optimistic. As he explains:

Honestly, it really feels like we have a real angle of attack on the problem that we can actually iterate on... and I think it's pretty likely going to work, actually. And that's really, really wild, and it's really exciting. It's like we have this hard problem that we've been talking about for years and years and years, and now we have a real shot at actually solving it. And that'd be so good if we did.


Jan thinks that this work is actually the most scientifically interesting part of machine learning. Rather than just throwing more chips and more data at a training run, this work requires actually understanding how these models work and how they think. The answers are likely to be breakthroughs on the level of solving the mysteries of the human brain.

The plan, in a nutshell, is to get AI to help us solve alignment. That might sound a bit crazy -- as one person described it, “like using one fire to put out another fire.”

But Jan’s thinking is this: the core problem is that AI capabilities will keep getting better and the challenge of monitoring cutting-edge models will keep getting harder, while human intelligence stays more or less the same. To have any hope of ensuring safety, we need our ability to monitor, understand, and design ML models to advance at the same pace as the complexity of the models themselves.

And there's an obvious way to do that: get AI to do most of the work, such that the sophistication of the AIs that need aligning, and the sophistication of the AIs doing the aligning, advance in lockstep.

Jan doesn't want to produce machine learning models capable of doing ML research. But such models are coming, whether we like it or not. And at that point Jan wants to make sure we turn them towards useful alignment and safety work, as much or more than we use them to advance AI capabilities.

Jan thinks it's so crazy it just might work. But some critics think it's simply crazy. They ask a wide range of difficult questions, including:

  • If you don't know how to solve alignment, how can you tell that your alignment assistant AIs are actually acting in your interest rather than working against you? Especially as they could just be pretending to care about what you care about.
  • How do you know that these technical problems can be solved at all, even in principle?
  • At the point that models are able to help with alignment, won't they also be so good at improving capabilities that we're in the middle of an explosion in what AI can do?


In today's interview host Rob Wiblin puts these doubts to Jan to hear how he responds to each, and they also cover:

  • OpenAI's current plans to achieve 'superalignment' and the reasoning behind them
  • Why alignment work is the most fundamental and scientifically interesting research in ML
  • The kinds of people he’s excited to hire to join his team and maybe save the world
  • What most readers misunderstood about the OpenAI announcement
  • The three ways Jan expects AI to help solve alignment: mechanistic interpretability, generalization, and scalable oversight
  • What the standard should be for confirming whether Jan's team has succeeded
  • Whether OpenAI should (or will) commit to stop training more powerful general models if they don't think the alignment problem has been solved
  • Whether Jan thinks OpenAI has deployed models too quickly or too slowly
  • The many other actors who also have to do their jobs really well if we're going to have a good AI future
  • Plenty more


Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript.

Producer and editor: Keiran Harris
Audio Engineering Lead: Ben Cordell
Technical editing: Simon Monsour and Milo McGuire
Additional content editing: Katy Moore and Luisa Rodriguez
Transcriptions: Katy Moore

Avsnitt(299)

#206 – Anil Seth on the predictive brain and how to study consciousness

#206 – Anil Seth on the predictive brain and how to study consciousness

"In that famous example of the dress, half of the people in the world saw [blue and black], half saw [white and gold]. It turns out there’s individual differences in how brains take into account ambient light. Colour is one example where it’s pretty clear that what we experience is a kind of inference: it’s the brain’s best guess about what’s going on in some way out there in the world. And that’s the claim that I’ve taken on board as a general hypothesis for consciousness: that all our perceptual experiences are inferences about something we don’t and cannot have direct access to." —Anil SethIn today’s episode, host Luisa Rodriguez speaks to Anil Seth — director of the Sussex Centre for Consciousness Science — about how much we can learn about consciousness by studying the brain.Links to learn more, highlights, and full transcript.They cover:What groundbreaking studies with split-brain patients and blindsight have already taught us about the nature of consciousness.Anil’s theory that our perception is a “controlled hallucination” generated by our predictive brains.Whether looking for the parts of the brain that correlate with consciousness is the right way to learn about what consciousness is.Whether our theories of human consciousness can be applied to nonhuman animals.Anil’s thoughts on whether machines could ever be conscious.Disagreements and open questions in the field of consciousness studies, and what areas Anil is most excited to explore next.And much more.Chapters:Cold open (00:00:00)Luisa’s intro (00:01:02)The interview begins (00:02:42)How expectations and perception affect consciousness (00:03:05)How the brain makes sense of the body it’s within (00:21:33)Psychedelics and predictive processing (00:32:06)Blindsight and visual consciousness (00:36:45)Split-brain patients (00:54:56)Overflow experiments (01:05:28)How much can we learn about consciousness from empirical research? (01:14:23)Which parts of the brain are responsible for conscious experiences? (01:27:37)Current state and disagreements in the study of consciousness (01:38:36)Digital consciousness (01:55:55)Consciousness in nonhuman animals (02:18:11)What’s next for Anil (02:30:18)Luisa’s outro (02:32:46)Producer: Keiran HarrisAudio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongContent editing: Luisa Rodriguez, Katy Moore, and Keiran HarrisTranscriptions: Katy Moore

1 Nov 20242h 33min

How much does a vote matter? (Article)

How much does a vote matter? (Article)

If you care about social impact, is voting important? In this piece, Rob investigates the two key things that determine the impact of your vote:The chances of your vote changing an election’s outcome.How much better some candidates are for the world as a whole, compared to others.He then discusses a couple of the best arguments against voting in important elections, namely:If an election is competitive, that means other people disagree about which option is better, and you’re at some risk of voting for the worse candidate by mistake.While voting itself doesn’t take long, knowing enough to accurately pick which candidate is better for the world actually does take substantial effort — effort that could be better allocated elsewhere.Finally, Rob covers the impact of donating to campaigns or working to "get out the vote," which can be effective ways to generate additional votes for your preferred candidate.We last released this article in October 2020, but we think it largely still stands up today.Chapters:Rob's intro (00:00:00)Introduction (00:01:12)What's coming up (00:02:35)The probability of one vote changing an election (00:03:58)How much does it matter who wins? (00:09:29)What if you’re wrong? (00:16:38)Is deciding how to vote too much effort? (00:21:47)How much does it cost to drive one extra vote? (00:25:13)Overall, is it altruistic to vote? (00:29:38)Rob's outro (00:31:19)Producer: Keiran Harris

28 Okt 202432min

#205 – Sébastien Moro on the most insane things fish can do

#205 – Sébastien Moro on the most insane things fish can do

"You have a tank split in two parts: if the fish gets in the compartment with a red circle, it will receive food, and food will be delivered in the other tank as well. If the fish takes the blue triangle, this fish will receive food, but nothing will be delivered in the other tank. So we have a prosocial choice and antisocial choice. When there is no one in the other part of the tank, the male is choosing randomly. If there is a male, a possible rival: antisocial — almost 100% of the time. Now, if there is his wife — his female, this is a prosocial choice all the time."And now a question: Is it just because this is a female or is it just for their female? Well, when they're bringing a new female, it’s the antisocial choice all the time. Now, if there is not the female of the male, it will depend on how long he's been separated from his female. At first it will be antisocial, and after a while he will start to switch to prosocial choices." —Sébastien MoroIn today’s episode, host Luisa Rodriguez speaks to science writer and video blogger Sébastien Moro about the latest research on fish consciousness, intelligence, and potential sentience.Links to learn more, highlights, and full transcript.They cover:The insane capabilities of fish in tests of memory, learning, and problem-solving.Examples of fish that can beat primates on cognitive tests and recognise individual human faces.Fishes’ social lives, including pair bonding, “personalities,” cooperation, and cultural transmission.Whether fish can experience emotions, and how this is even studied.The wild evolutionary innovations of fish, who adapted to thrive in diverse environments from mangroves to the deep sea.How some fish have sensory capabilities we can’t even really fathom — like “seeing” electrical fields and colours we can’t perceive.Ethical issues raised by evidence that fish may be conscious and experience suffering.And plenty more.Producer: Keiran HarrisAudio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongContent editing: Luisa Rodriguez, Katy Moore, and Keiran HarrisTranscriptions: Katy Moore

23 Okt 20243h 11min

#204 – Nate Silver on making sense of SBF, and his biggest critiques of effective altruism

#204 – Nate Silver on making sense of SBF, and his biggest critiques of effective altruism

Rob Wiblin speaks with FiveThirtyEight election forecaster and author Nate Silver about his new book: On the Edge: The Art of Risking Everything.Links to learn more, highlights, video, and full transcript.On the Edge explores a cultural grouping Nate dubs “the River” — made up of people who are analytical, competitive, quantitatively minded, risk-taking, and willing to be contrarian. It’s a tendency he considers himself a part of, and the River has been doing well for itself in recent decades — gaining cultural influence through success in finance, technology, gambling, philanthropy, and politics, among other pursuits.But on Nate’s telling, it’s a group particularly vulnerable to oversimplification and hubris. Where Riverians’ ability to calculate the “expected value” of actions isn’t as good as they believe, their poorly calculated bets can leave a trail of destruction — aptly demonstrated by Nate’s discussion of the extended time he spent with FTX CEO Sam Bankman-Fried before and after his downfall.Given this show’s focus on the world’s most pressing problems and how to solve them, we narrow in on Nate’s discussion of effective altruism (EA), which has been little covered elsewhere. Nate met many leaders and members of the EA community in researching the book and has watched its evolution online for many years.Effective altruism is the River style of doing good, because of its willingness to buck both fashion and common sense — making its giving decisions based on mathematical calculations and analytical arguments with the goal of maximising an outcome.Nate sees a lot to admire in this, but the book paints a mixed picture in which effective altruism is arguably too trusting, too utilitarian, too selfless, and too reckless at some times, while too image-conscious at others.But while everything has arguable weaknesses, could Nate actually do any better in practice? We ask him:How would Nate spend $10 billion differently than today’s philanthropists influenced by EA?Is anyone else competitive with EA in terms of impact per dollar?Does he have any big disagreements with 80,000 Hours’ advice on how to have impact?Is EA too big a tent to function?What global problems could EA be ignoring?Should EA be more willing to court controversy?Does EA’s niceness leave it vulnerable to exploitation?What moral philosophy would he have modelled EA on?Rob and Nate also talk about:Nate’s theory of Sam Bankman-Fried’s psychology.Whether we had to “raise or fold” on COVID.Whether Sam Altman and Sam Bankman-Fried are structurally similar cases or not.“Winners’ tilt.”Whether it’s selfish to slow down AI progress.The ridiculous 13 Keys to the White House.Whether prediction markets are now overrated.Whether venture capitalists talk a big talk about risk while pushing all the risk off onto the entrepreneurs they fund.And plenty more.Chapters:Cold open (00:00:00)Rob's intro (00:01:03)The interview begins (00:03:08)Sam Bankman-Fried and trust in the effective altruism community (00:04:09)Expected value (00:19:06)Similarities and differences between Sam Altman and SBF (00:24:45)How would Nate do EA differently? (00:31:54)Reservations about utilitarianism (00:44:37)Game theory equilibrium (00:48:51)Differences between EA culture and rationalist culture (00:52:55)What would Nate do with $10 billion to donate? (00:57:07)COVID strategies and tradeoffs (01:06:52)Is it selfish to slow down AI progress? (01:10:02)Democratic legitimacy of AI progress (01:18:33)Dubious election forecasting (01:22:40)Assessing how reliable election forecasting models are (01:29:58)Are prediction markets overrated? (01:41:01)Venture capitalists and risk (01:48:48)Producer and editor: Keiran HarrisAudio engineering by Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongVideo engineering: Simon MonsourTranscriptions: Katy Moore

16 Okt 20241h 57min

#203 – Peter Godfrey-Smith on interfering with wild nature, accepting death, and the origin of complex civilisation

#203 – Peter Godfrey-Smith on interfering with wild nature, accepting death, and the origin of complex civilisation

"In the human case, it would be mistaken to give a kind of hour-by-hour accounting. You know, 'I had +4 level of experience for this hour, then I had -2 for the next hour, and then I had -1' — and you sort of sum to try to work out the total… And I came to think that something like that will be applicable in some of the animal cases as well… There are achievements, there are experiences, there are things that can be done in the face of difficulty that might be seen as having the same kind of redemptive role, as casting into a different light the difficult events that led up to it."The example I use is watching some birds successfully raising some young, fighting off a couple of rather aggressive parrots of another species that wanted to fight them, prevailing against difficult odds — and doing so in a way that was so wholly successful. It seemed to me that if you wanted to do an accounting of how things had gone for those birds, you would not want to do the naive thing of just counting up difficult and less-difficult hours. There’s something special about what’s achieved at the end of that process." —Peter Godfrey-SmithIn today’s episode, host Luisa Rodriguez speaks to Peter Godfrey-Smith — bestselling author and science philosopher — about his new book, Living on Earth: Forests, Corals, Consciousness, and the Making of the World.Links to learn more, highlights, and full transcript.They cover:Why octopuses and dolphins haven’t developed complex civilisation despite their intelligence.How the role of culture has been crucial in enabling human technological progress.Why Peter thinks the evolutionary transition from sea to land was key to enabling human-like intelligence — and why we should expect to see that in extraterrestrial life too.Whether Peter thinks wild animals’ lives are, on balance, good or bad, and when, if ever, we should intervene in their lives.Whether we can and should avoid death by uploading human minds.And plenty more.Chapters:Cold open (00:00:00)Luisa's intro (00:00:57)The interview begins (00:02:12)Wild animal suffering and rewilding (00:04:09)Thinking about death (00:32:50)Uploads of ourselves (00:38:04)Culture and how minds make things happen (00:54:05)Challenges for water-based animals (01:01:37)The importance of sea-to-land transitions in animal life (01:10:09)Luisa's outro (01:23:43)Producer: Keiran HarrisAudio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongContent editing: Luisa Rodriguez, Katy Moore, and Keiran HarrisTranscriptions: Katy Moore

3 Okt 20241h 25min

Luisa and Keiran on free will, and the consequences of never feeling enduring guilt or shame

Luisa and Keiran on free will, and the consequences of never feeling enduring guilt or shame

In this episode from our second show, 80k After Hours, Luisa Rodriguez and Keiran Harris chat about the consequences of letting go of enduring guilt, shame, anger, and pride.Links to learn more, highlights, and full transcript.They cover:Keiran’s views on free will, and how he came to hold themWhat it’s like not experiencing sustained guilt, shame, and angerWhether Luisa would become a worse person if she felt less guilt and shame — specifically whether she’d work fewer hours, or donate less money, or become a worse friendWhether giving up guilt and shame also means giving up prideThe implications for loveThe neurological condition ‘Jerk Syndrome’And some practical advice on feeling less guilt, shame, and angerWho this episode is for:People sympathetic to the idea that free will is an illusionPeople who experience tons of guilt, shame, or angerPeople worried about what would happen if they stopped feeling tonnes of guilt, shame, or angerWho this episode isn’t for:People strongly in favour of retributive justicePhilosophers who can’t stand random non-philosophers talking about philosophyNon-philosophers who can’t stand random non-philosophers talking about philosophyChapters:Cold open (00:00:00)Luisa's intro (00:01:16)The chat begins (00:03:15)Keiran's origin story (00:06:30)Charles Whitman (00:11:00)Luisa's origin story (00:16:41)It's unlucky to be a bad person (00:19:57)Doubts about whether free will is an illusion (00:23:09)Acting this way just for other people (00:34:57)Feeling shame over not working enough (00:37:26)First person / third person distinction (00:39:42)Would Luisa become a worse person if she felt less guilt? (00:44:09)Feeling bad about not being a different person (00:48:18)Would Luisa donate less money? (00:55:14)Would Luisa become a worse friend? (01:01:07)Pride (01:08:02)Love (01:15:35)Bears and hurricanes (01:19:53)Jerk Syndrome (01:24:24)Keiran's outro (01:34:47)Get more episodes like this by subscribing to our more experimental podcast on the world’s most pressing problems and how to solve them: type "80k After Hours" into your podcasting app. Producer: Keiran HarrisAudio mastering: Milo McGuireTranscriptions: Katy Moore

27 Sep 20241h 36min

#202 – Venki Ramakrishnan on the cutting edge of anti-ageing science

#202 – Venki Ramakrishnan on the cutting edge of anti-ageing science

"For every far-out idea that turns out to be true, there were probably hundreds that were simply crackpot ideas. In general, [science] advances building on the knowledge we have, and seeing what the next questions are, and then getting to the next stage and the next stage and so on. And occasionally there’ll be revolutionary ideas which will really completely change your view of science. And it is possible that some revolutionary breakthrough in our understanding will come about and we might crack this problem, but there’s no evidence for that. It doesn’t mean that there isn’t a lot of promising work going on. There are many legitimate areas which could lead to real improvements in health in old age. So I’m fairly balanced: I think there are promising areas, but there’s a lot of work to be done to see which area is going to be promising, and what the risks are, and how to make them work." —Venki RamakrishnanIn today’s episode, host Luisa Rodriguez speaks to Venki Ramakrishnan — molecular biologist and Nobel Prize winner — about his new book, Why We Die: The New Science of Aging and the Quest for Immortality.Links to learn more, highlights, and full transcript.They cover:What we can learn about extending human lifespan — if anything — from “immortal” aquatic animal species, cloned sheep, and the oldest people to have ever lived.Which areas of anti-ageing research seem most promising to Venki — including caloric restriction, removing senescent cells, cellular reprogramming, and Yamanaka factors — and which Venki thinks are overhyped.Why eliminating major age-related diseases might only extend average lifespan by 15 years.The social impacts of extending healthspan or lifespan in an ageing population — including the potential danger of massively increasing inequality if some people can access life-extension interventions while others can’t.And plenty more.Chapters:Cold open (00:00:00)Luisa's intro (00:01:04)The interview begins (00:02:21)Reasons to explore why we age and die (00:02:35)Evolutionary pressures and animals that don't biologically age (00:06:55)Why does ageing cause us to die? (00:12:24)Is there a hard limit to the human lifespan? (00:17:11)Evolutionary tradeoffs between fitness and longevity (00:21:01)How ageing resets with every generation, and what we can learn from clones (00:23:48)Younger blood (00:31:20)Freezing cells, organs, and bodies (00:36:47)Are the goals of anti-ageing research even realistic? (00:43:44)Dementia (00:49:52)Senescence (01:01:58)Caloric restriction and metabolic pathways (01:11:45)Yamanaka factors (01:34:07)Cancer (01:47:44)Mitochondrial dysfunction (01:58:40)Population effects of extended lifespan (02:06:12)Could increased longevity increase inequality? (02:11:48)What’s surprised Venki about this research (02:16:06)Luisa's outro (02:19:26)Producer: Keiran HarrisAudio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongContent editing: Luisa Rodriguez, Katy Moore, and Keiran HarrisTranscriptions: Katy Moore

19 Sep 20242h 20min

#201 – Ken Goldberg on why your robot butler isn’t here yet

#201 – Ken Goldberg on why your robot butler isn’t here yet

"Perception is quite difficult with cameras: even if you have a stereo camera, you still can’t really build a map of where everything is in space. It’s just very difficult. And I know that sounds surprising, because humans are very good at this. In fact, even with one eye, we can navigate and we can clear the dinner table. But it seems that we’re building in a lot of understanding and intuition about what’s happening in the world and where objects are and how they behave. For robots, it’s very difficult to get a perfectly accurate model of the world and where things are. So if you’re going to go manipulate or grasp an object, a small error in that position will maybe have your robot crash into the object, a delicate wine glass, and probably break it. So the perception and the control are both problems." —Ken GoldbergIn today’s episode, host Luisa Rodriguez speaks to Ken Goldberg — robotics professor at UC Berkeley — about the major research challenges still ahead before robots become broadly integrated into our homes and societies.Links to learn more, highlights, and full transcript.They cover:Why training robots is harder than training large language models like ChatGPT.The biggest engineering challenges that still remain before robots can be widely useful in the real world.The sectors where Ken thinks robots will be most useful in the coming decades — like homecare, agriculture, and medicine.Whether we should be worried about robot labour affecting human employment.Recent breakthroughs in robotics, and what cutting-edge robots can do today.Ken’s work as an artist, where he explores the complex relationship between humans and technology.And plenty more.Chapters:Cold open (00:00:00)Luisa's intro (00:01:19)General purpose robots and the “robotics bubble” (00:03:11)How training robots is different than training large language models (00:14:01)What can robots do today? (00:34:35)Challenges for progress: fault tolerance, multidimensionality, and perception (00:41:00)Recent breakthroughs in robotics (00:52:32)Barriers to making better robots: hardware, software, and physics (01:03:13)Future robots in home care, logistics, food production, and medicine (01:16:35)How might robot labour affect the job market? (01:44:27)Robotics and art (01:51:28)Luisa's outro (02:00:55)Producer: Keiran HarrisAudio engineering: Dominic Armstrong, Ben Cordell, Milo McGuire, and Simon MonsourContent editing: Luisa Rodriguez, Katy Moore, and Keiran HarrisTranscriptions: Katy Moore

13 Sep 20242h 1min

Populärt inom Utbildning

bygga-at-idioter
rss-bara-en-till-om-missbruk-medberoende-2
historiepodden-se
det-skaver
harrisons-dramatiska-historia
nu-blir-det-historia
allt-du-velat-veta
nar-man-talar-om-trollen
johannes-hansen-podcast
not-fanny-anymore
roda-vita-rosen
sektledare
i-vantan-pa-katastrofen
sa-in-i-sjalen
alska-oss
handen-pa-hjartat
jagaren
rss-max-tant-med-max-villman
rss-sjalsligt-avkladd
rss-npf-podden