#159 – Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or less

#159 – Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or less

In July, OpenAI announced a new team and project: Superalignment. The goal is to figure out how to make superintelligent AI systems aligned and safe to use within four years, and the lab is putting a massive 20% of its computational resources behind the effort.

Today's guest, Jan Leike, is Head of Alignment at OpenAI and will be co-leading the project. As OpenAI puts it, "...the vast power of superintelligence could be very dangerous, and lead to the disempowerment of humanity or even human extinction. ... Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue."

Links to learn more, summary and full transcript.

Given that OpenAI is in the business of developing superintelligent AI, it sees that as a scary problem that urgently has to be fixed. So it’s not just throwing compute at the problem -- it’s also hiring dozens of scientists and engineers to build out the Superalignment team.

Plenty of people are pessimistic that this can be done at all, let alone in four years. But Jan is guardedly optimistic. As he explains:

Honestly, it really feels like we have a real angle of attack on the problem that we can actually iterate on... and I think it's pretty likely going to work, actually. And that's really, really wild, and it's really exciting. It's like we have this hard problem that we've been talking about for years and years and years, and now we have a real shot at actually solving it. And that'd be so good if we did.


Jan thinks that this work is actually the most scientifically interesting part of machine learning. Rather than just throwing more chips and more data at a training run, this work requires actually understanding how these models work and how they think. The answers are likely to be breakthroughs on the level of solving the mysteries of the human brain.

The plan, in a nutshell, is to get AI to help us solve alignment. That might sound a bit crazy -- as one person described it, “like using one fire to put out another fire.”

But Jan’s thinking is this: the core problem is that AI capabilities will keep getting better and the challenge of monitoring cutting-edge models will keep getting harder, while human intelligence stays more or less the same. To have any hope of ensuring safety, we need our ability to monitor, understand, and design ML models to advance at the same pace as the complexity of the models themselves.

And there's an obvious way to do that: get AI to do most of the work, such that the sophistication of the AIs that need aligning, and the sophistication of the AIs doing the aligning, advance in lockstep.

Jan doesn't want to produce machine learning models capable of doing ML research. But such models are coming, whether we like it or not. And at that point Jan wants to make sure we turn them towards useful alignment and safety work, as much or more than we use them to advance AI capabilities.

Jan thinks it's so crazy it just might work. But some critics think it's simply crazy. They ask a wide range of difficult questions, including:

  • If you don't know how to solve alignment, how can you tell that your alignment assistant AIs are actually acting in your interest rather than working against you? Especially as they could just be pretending to care about what you care about.
  • How do you know that these technical problems can be solved at all, even in principle?
  • At the point that models are able to help with alignment, won't they also be so good at improving capabilities that we're in the middle of an explosion in what AI can do?


In today's interview host Rob Wiblin puts these doubts to Jan to hear how he responds to each, and they also cover:

  • OpenAI's current plans to achieve 'superalignment' and the reasoning behind them
  • Why alignment work is the most fundamental and scientifically interesting research in ML
  • The kinds of people he’s excited to hire to join his team and maybe save the world
  • What most readers misunderstood about the OpenAI announcement
  • The three ways Jan expects AI to help solve alignment: mechanistic interpretability, generalization, and scalable oversight
  • What the standard should be for confirming whether Jan's team has succeeded
  • Whether OpenAI should (or will) commit to stop training more powerful general models if they don't think the alignment problem has been solved
  • Whether Jan thinks OpenAI has deployed models too quickly or too slowly
  • The many other actors who also have to do their jobs really well if we're going to have a good AI future
  • Plenty more


Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript.

Producer and editor: Keiran Harris
Audio Engineering Lead: Ben Cordell
Technical editing: Simon Monsour and Milo McGuire
Additional content editing: Katy Moore and Luisa Rodriguez
Transcriptions: Katy Moore

Avsnitt(294)

#13 - Claire Walsh on testing which policies work & how to get governments to listen to the results

#13 - Claire Walsh on testing which policies work & how to get governments to listen to the results

In both rich and poor countries, government policy is often based on no evidence at all and many programs don’t work. This has particularly harsh effects on the global poor - in some countries governments only spend $100 on each citizen a year so they can’t afford to waste a single dollar. Enter MIT’s Poverty Action Lab (J-PAL). Since 2003 they’ve conducted experiments to figure out what policies actually help recipients, and then tried to get them implemented by governments and non-profits. Claire Walsh leads J-PAL’s Government Partnership Initiative, which works to evaluate policies and programs in collaboration with developing world governments, scale policies that have been shown to work, and generally promote a culture of evidence-based policymaking. Summary, links to career opportunities and topics discussed in the show. We discussed (her views only, not J-PAL’s): * How can they get evidence backed policies adopted? Do politicians in the developing world even care whether their programs actually work? Is the norm evidence-based policy, or policy-based evidence? * Is evidence-based policy an evidence-based strategy itself? * Which policies does she think would have a particularly large impact on human welfare relative to their cost? * How did she come to lead one of J-PAL’s departments at 29? * How do you evaluate the effectiveness of energy and environment programs (Walsh’s area of expertise), and what are the standout approaches in that area? * 80,000 Hours has warned people about the downsides of starting your career in a non-profit. Walsh started her career in a non-profit and has thrived, so are we making a mistake? * Other than J-PAL, what are the best places to work in development? What are the best subjects to study? Where can you go network to break into the sector? * Is living in poverty as bad as we think? And plenty of other things besides. We haven’t run an RCT to test whether this episode will actually help your career, but I suggest you listen anyway. Trust my intuition on this one.

31 Okt 201752min

#12 - Beth Cameron works to stop you dying in a pandemic. Here’s what keeps her up at night.

#12 - Beth Cameron works to stop you dying in a pandemic. Here’s what keeps her up at night.

“When you're in the middle of a crisis and you have to ask for money, you're already too late.” That’s Dr Beth Cameron, who leads Global Biological Policy and Programs at the Nuclear Threat Initiative. Beth should know. She has years of experience preparing for and fighting the diseases of our nightmares, on the White House Ebola Taskforce, in the National Security Council staff, and as the Assistant Secretary of Defense for Nuclear, Chemical and Biological Defense Programs. Summary, list of career opportunities, extra links to learn more and coaching application. Unfortunately, the countries of the world aren’t prepared for a crisis - and like children crowded into daycare, there’s a good chance something will make us all sick at once. During past pandemics countries have dragged their feet over who will pay to contain them, or struggled to move people and supplies where they needed to be. At the same time advanced biotechnology threatens to make it possible for terrorists to bring back smallpox - or create something even worse. In this interview we look at the current state of play in disease control, what needs to change, and how you can build the career capital necessary to make those changes yourself. That includes: * What and where to study, and where to begin a career in pandemic preparedness. Below you’ll find a lengthy list of people and places mentioned in the interview, and others we’ve had recommended to us. * How the Nuclear Threat Initiative, with just 50 people, collaborates with governments around the world to reduce the risk of nuclear or biological catastrophes, and whether they might want to hire you. * The best strategy for containing pandemics. * Why we lurch from panic, to neglect, to panic again when it comes to protecting ourselves from contagious diseases. * Current reform efforts within the World Health Organisation, and attempts to prepare partial vaccines ahead of time. * Which global health security groups most impress Beth, and what they’re doing. * What new technologies could be invented to make us safer. * Whether it’s possible to help solve the problem through mass advocacy. * Much more besides. Get free, one-on-one career advice to improve biosecurity Considering a relevant grad program like a biology PhD, medicine, or security studies? Able to apply for a relevant job already? We’ve helped dozens of people plan their careers to work on pandemic preparedness and put them in touch with mentors. If you want to work on the problem discussed in this episode, you should apply for coaching: Read more

25 Okt 20171h 45min

#11 - Spencer Greenberg on speeding up social science 10-fold & why plenty of startups cause harm

#11 - Spencer Greenberg on speeding up social science 10-fold & why plenty of startups cause harm

Do most meat eaters think it’s wrong to hurt animals? Do Americans think climate change is likely to cause human extinction? What is the best, state-of-the-art therapy for depression? How can we make academics more intellectually honest, so we can actually trust their findings? How can we speed up social science research ten-fold? Do most startups improve the world, or make it worse? If you’re interested in these question, this interview is for you. Click for a full transcript, links discussed in the show, etc. A scientist, entrepreneur, writer and mathematician, Spencer Greenberg is constantly working to create tools to speed up and improve research and critical thinking. These include: * Rapid public opinion surveys to find out what most people actually think about animal consciousness, farm animal welfare, the impact of developing world charities and the likelihood of extinction by various different means; * Tools to enable social science research to be run en masse very cheaply; * ClearerThinking.org, a highly popular site for improving people’s judgement and decision-making; * Ways to transform data analysis methods to ensure that papers only show true findings; * Innovative research methods; * Ways to decide which research projects are actually worth pursuing. In this interview, Spencer discusses all of these and more. If you don’t feel like listening, that just shows that you have poor judgement and need to benefit from his wisdom even more! Get free, one-on-one career advice We’ve helped hundreds of people compare their options, get introductions, and find high impact jobs. If you want to work on any of the problems discussed in this episode, find out if our coaching can help you.

17 Okt 20171h 29min

#10 - Nick Beckstead on how to spend billions of dollars preventing human extinction

#10 - Nick Beckstead on how to spend billions of dollars preventing human extinction

What if you were in a position to give away billions of dollars to improve the world? What would you do with it? This is the problem facing Program Officers at the Open Philanthropy Project - people like Dr Nick Beckstead. Following a PhD in philosophy, Nick works to figure out where money can do the most good. He’s been involved in major grants in a wide range of areas, including ending factory farming through technological innovation, safeguarding the world from advances in biotechnology and artificial intelligence, and spreading rational compassion. Full transcript, coaching application form, overview of the conversation, and links to resources discussed in the episode: This episode is a tour through some of the toughest questions ‘effective altruists’ face when figuring out how to best improve the world, including: * * Should we mostly try to help people currently alive, or future generations? Nick studied this question for years in his PhD thesis, On the Overwhelming Importance of Shaping the Far Future. (The first 31 minutes is a snappier version of my conversation with Toby Ord.) * Is clean meat (aka *in vitro* meat) technologically feasible any time soon, or should we be looking for plant-based alternatives? * What are the greatest risks to human civilisation? * To stop malaria is it more cost-effective to use technology to eliminate mosquitos than to distribute bed nets? * Should people who want to improve the future work for changes that will be very useful in a specific scenario, or just generally try to improve how well humanity makes decisions? * What specific jobs should our listeners take in order for Nick to be able to spend more money in useful ways to improve the world? * Should we expect the future to be better if the economy grows more quickly - or more slowly? Get free, one-on-one career advice We’ve helped dozens of people compare between their options, get introductions, and jobs important for the the long-run future. If you want to work on any of the problems discussed in this episode, find out if our coaching can help you.

11 Okt 20171h 51min

#9 - Christine Peterson on how insecure computers could lead to global disaster, and how to fix it

#9 - Christine Peterson on how insecure computers could lead to global disaster, and how to fix it

Take a trip to Silicon Valley in the 70s and 80s, when going to space sounded like a good way to get around environmental limits, people started cryogenically freezing themselves, and nanotechnology looked like it might revolutionise industry – or turn us all into grey goo. Full transcript, coaching application form, overview of the conversation, and extra resources to learn more: In this episode of the 80,000 Hours Podcast Christine Peterson takes us back to her youth in the Bay Area, the ideas she encountered there, and what the dreamers she met did as they grew up. We also discuss how she came up with the term ‘open source software’ (and how she had to get someone else to propose it). Today Christine helps runs the Foresight Institute, which fills a gap left by for-profit technology companies – predicting how new revolutionary technologies could go wrong, and ensuring we steer clear of the downsides. We dive into: * Whether the poor security of computer systems poses a catastrophic risk for the world. Could all our essential services be taken down at once? And if so, what can be done about it? * Can technology ‘move fast and break things’ without eventually breaking the world? Would it be better for technology to advance more quickly, or more slowly? * How Christine came up with the term ‘open source software’ (and why someone else had to propose it). * Will AIs designed for wide-scale automated hacking make computers more or less secure? * Would it be good to radically extend human lifespan? Is it sensible to cryogenically freeze yourself in the hope of being resurrected in the future? * Could atomically precise manufacturing (nanotechnology) really work? Why was it initially so controversial and why did people stop worrying about it? * Should people who try to do good in their careers work long hours and take low salaries? Or should they take care of themselves first of all? * How she thinks the the effective altruism community resembles the scene she was involved with when she was wrong, and where it might be going wrong. Get free, one-on-one career advice We’ve helped dozens of people compare between their options, get introductions, and jobs important for the the long-run future. If you want to work on any of the problems discussed in this episode, find out if our coaching can help you.

4 Okt 20171h 45min

#8 - Lewis Bollard on how to end factory farming in our lifetimes

#8 - Lewis Bollard on how to end factory farming in our lifetimes

Every year tens of billions of animals are raised in terrible conditions in factory farms before being killed for human consumption. Over the last two years Lewis Bollard – Project Officer for Farm Animal Welfare at the Open Philanthropy Project – has conducted extensive research into the best ways to eliminate animal suffering in farms as soon as possible. This has resulted in $30 million in grants to farm animal advocacy. Full transcript, coaching application form, overview of the conversation, and extra resources to learn more: We covered almost every approach being taken, which ones work, and how individuals can best contribute through their careers. We also had time to venture into a wide range of issues that are less often discussed, including: * Why Lewis thinks insect farming would be worse than the status quo, and whether we should look for ‘humane’ insecticides; * How young people can set themselves up to contribute to scientific research into meat alternatives; * How genetic manipulation of chickens has caused them to suffer much more than their ancestors, but could also be used to make them better off; * Why Lewis is skeptical of vegan advocacy; * Why he doubts that much can be done to tackle factory farming through legal advocacy or electoral politics; * Which species of farm animals is best to focus on first; * Whether fish and crustaceans are conscious, and if so what can be done for them; * Many other issues listed below in the Overview of the discussion. Get free, one-on-one career advice We’ve helped dozens of people compare between their options, get introductions, and jobs important for the the long-run future. If you want to work on any of the problems discussed in this episode, find out if our coaching can help you. Overview of the discussion **2m40s** What originally drew you to dedicate your career to helping animals and why did Open Philanthropy end up focusing on it? **5m40s** Do you have any concrete way of assessing the severity of animal suffering? **7m10s** Do you think the environmental gains are large compared to those that we might hope to get from animal welfare improvement? **7m55s** What grants have you made at Open Phil? How did you go about deciding which groups to fund and which ones not to fund? **9m50s** Why does Open Phil focus on chickens and fish? Is this the right call? More...

27 Sep 20173h 16min

#7 - Julia Galef on making humanity more rational, what EA does wrong, and why Twitter isn’t all bad

#7 - Julia Galef on making humanity more rational, what EA does wrong, and why Twitter isn’t all bad

The scientific revolution in the 16th century was one of the biggest societal shifts in human history, driven by the discovery of new and better methods of figuring out who was right and who was wrong. Julia Galef - a well-known writer and researcher focused on improving human judgment, especially about high stakes questions - believes that if we could again develop new techniques to predict the future, resolve disagreements and make sound decisions together, it could dramatically improve the world across the board. We brought her in to talk about her ideas. This interview complements a new detailed review of whether and how to follow Julia’s career path. Apply for personalised coaching, see what questions are asked when, and read extra resources to learn more. Julia has been host of the Rationally Speaking podcast since 2010, co-founder of the Center for Applied Rationality in 2012, and is currently working for the Open Philanthropy Project on an investigation of expert disagreements. In our conversation we ended up speaking about a wide range of topics, including: * Her research on how people can have productive intellectual disagreements. * Why she once planned to become an urban designer. * Why she doubts people are more rational than 200 years ago. * What makes her a fan of Twitter (while I think it’s dystopian). * Whether people should write more books. * Whether it’s a good idea to run a podcast, and how she grew her audience. * Why saying you don’t believe X often won’t convince people you don’t. * Why she started a PhD in economics but then stopped. * Whether she would recommend an unconventional career like her own. * Whether the incentives in the intelligence community actually support sound thinking. * Whether big institutions will actually pick up new tools for improving decision-making if they are developed. * How to start out pursuing a career in which you enhance human judgement and foresight. Get free, one-on-one career advice to help you improve judgement and decision-making We’ve helped dozens of people compare between their options, get introductions, and jobs important for the the long-run future. **If you want to work on any of the problems discussed in this episode, find out if our coaching can help you:** APPLY FOR COACHING Overview of the conversation **1m30s** So what projects are you working on at the moment? **3m50s** How are you working on the problem of expert disagreement? **6m0s** Is this the same method as the double crux process that was developed at the Center for Applied Rationality? **10m** Why did the Open Philanthropy Project decide this was a very valuable project to fund? **13m** Is the double crux process actually that effective? **14m50s** Is Facebook dangerous? **17m** What makes for a good life? Can you be mistaken about having a good life? **19m** Should more people write books? Read more...

13 Sep 20171h 14min

#6 - Toby Ord on why the long-term future matters more than anything else & what to do about it

#6 - Toby Ord on why the long-term future matters more than anything else & what to do about it

Of all the people whose well-being we should care about, only a small fraction are alive today. The rest are members of future generations who are yet to exist. Whether they’ll be born into a world that is flourishing or disintegrating – and indeed, whether they will ever be born at all – is in large part up to us. As such, the welfare of future generations should be our number one moral concern. This conclusion holds true regardless of whether your moral framework is based on common sense, consequences, rules of ethical conduct, cooperating with others, virtuousness, keeping options open – or just a sense of wonder about the universe we find ourselves in. That’s the view of Dr Toby Ord, a philosophy Fellow at the University of Oxford and co-founder of the effective altruism community. In this episode of the 80,000 Hours Podcast Dr Ord makes the case that aiming for a positive long-term future is likely the best way to improve the world. Apply for personalised coaching, see what questions are asked when, and read extra resources to learn more. We then discuss common objections to long-termism, such as the idea that benefits to future generations are less valuable than those to people alive now, or that we can’t meaningfully benefit future generations beyond taking the usual steps to improve the present. Later the conversation turns to how individuals can and have changed the course of history, what could go wrong and why, and whether plans to colonise Mars would actually put humanity in a safer position than it is today. This episode goes deep into the most distinctive features of our advice. It’s likely the most in-depth discussion of how 80,000 Hours and the effective altruism community think about the long term future and why - and why we so often give it top priority. It’s best to subscribe, so you can listen at leisure on your phone, speed up the conversation if you like, and get notified about future episodes. You can do so by searching ‘80,000 Hours’ wherever you get your podcasts. Want to help ensure humanity has a positive future instead of destroying itself? We want to help. We’ve helped 100s of people compare between their options, get introductions, and jobs important for the the long-run future. If you want to work on any of the problems discussed in this episode, such as artificial intelligence or biosecurity, find out if our coaching can help you. Overview of the discussion 3m30s - Why is the long-term future of humanity such a big deal, and perhaps the most important issue for us to be thinking about? 9m05s - Five arguments that future generations matter 21m50s - How bad would it be if humanity went extinct or civilization collapses? 26m40s - Why do people start saying such strange things when this topic comes up? 30m30s - Are there any other reasons to prioritize thinking about the long-term future of humanity that you wanted to raise before we move to objections? 36m10s - What is this school of thought called? Read more...

6 Sep 20172h 8min

Populärt inom Utbildning

historiepodden-se
rss-bara-en-till-om-missbruk-medberoende-2
det-skaver
bygga-at-idioter
alska-oss
rosceremoni
nu-blir-det-historia
allt-du-velat-veta
harrisons-dramatiska-historia
johannes-hansen-podcast
not-fanny-anymore
roda-vita-rosen
svd-ledarredaktionen
sektledare
nar-man-talar-om-trollen
rss-max-tant-med-max-villman
sa-in-i-sjalen
handen-pa-hjartat
i-vantan-pa-katastrofen
sex-pa-riktigt-med-marika-smith