#197 – Nick Joseph on whether Anthropic's AI safety policy is up to the task

#197 – Nick Joseph on whether Anthropic's AI safety policy is up to the task

The three biggest AI companies — Anthropic, OpenAI, and DeepMind — have now all released policies designed to make their AI models less likely to go rogue or cause catastrophic damage as they approach, and eventually exceed, human capabilities. Are they good enough?

That’s what host Rob Wiblin tries to hash out in this interview (recorded May 30) with Nick Joseph — one of the original cofounders of Anthropic, its current head of training, and a big fan of Anthropic’s “responsible scaling policy” (or “RSP”). Anthropic is the most safety focused of the AI companies, known for a culture that treats the risks of its work as deadly serious.

Links to learn more, highlights, video, and full transcript.

As Nick explains, these scaling policies commit companies to dig into what new dangerous things a model can do — after it’s trained, but before it’s in wide use. The companies then promise to put in place safeguards they think are sufficient to tackle those capabilities before availability is extended further. For instance, if a model could significantly help design a deadly bioweapon, then its weights need to be properly secured so they can’t be stolen by terrorists interested in using it that way.

As capabilities grow further — for example, if testing shows that a model could exfiltrate itself and spread autonomously in the wild — then new measures would need to be put in place to make that impossible, or demonstrate that such a goal can never arise.

Nick points out what he sees as the biggest virtues of the RSP approach, and then Rob pushes him on some of the best objections he’s found to RSPs being up to the task of keeping AI safe and beneficial. The two also discuss whether it's essential to eventually hand over operation of responsible scaling policies to external auditors or regulatory bodies, if those policies are going to be able to hold up against the intense commercial pressures that might end up arrayed against them.

In addition to all of that, Nick and Rob talk about:

  • What Nick thinks are the current bottlenecks in AI progress: people and time (rather than data or compute).
  • What it’s like working in AI safety research at the leading edge, and whether pushing forward capabilities (even in the name of safety) is a good idea.
  • What it’s like working at Anthropic, and how to get the skills needed to help with the safe development of AI.

And as a reminder, if you want to let us know your reaction to this interview, or send any other feedback, our inbox is always open at podcast@80000hours.org.

Chapters:

  • Cold open (00:00:00)
  • Rob’s intro (00:01:00)
  • The interview begins (00:03:44)
  • Scaling laws (00:04:12)
  • Bottlenecks to further progress in making AIs helpful (00:08:36)
  • Anthropic’s responsible scaling policies (00:14:21)
  • Pros and cons of the RSP approach for AI safety (00:34:09)
  • Alternatives to RSPs (00:46:44)
  • Is an internal audit really the best approach? (00:51:56)
  • Making promises about things that are currently technically impossible (01:07:54)
  • Nick’s biggest reservations about the RSP approach (01:16:05)
  • Communicating “acceptable” risk (01:19:27)
  • Should Anthropic’s RSP have wider safety buffers? (01:26:13)
  • Other impacts on society and future work on RSPs (01:34:01)
  • Working at Anthropic (01:36:28)
  • Engineering vs research (01:41:04)
  • AI safety roles at Anthropic (01:48:31)
  • Should concerned people be willing to take capabilities roles? (01:58:20)
  • Recent safety work at Anthropic (02:10:05)
  • Anthropic culture (02:14:35)
  • Overrated and underrated AI applications (02:22:06)
  • Rob’s outro (02:26:36)

Producer and editor: Keiran Harris
Audio engineering by Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Video engineering: Simon Monsour
Transcriptions: Katy Moore

Jaksot(297)

#39 - Spencer Greenberg on the scientific approach to solving difficult everyday questions

#39 - Spencer Greenberg on the scientific approach to solving difficult everyday questions

Will Trump be re-elected? Will North Korea give up their nuclear weapons? Will your friend turn up to dinner? Spencer Greenberg, founder of ClearerThinking.org has a process for working out such real life problems. Let’s work through one here: how likely is it that you’ll enjoy listening to this episode? The first step is to figure out your ‘prior probability’; what’s your estimate of how likely you are to enjoy the interview before getting any further evidence? Other than applying common sense, one way to figure this out is called reference class forecasting: looking at similar cases and seeing how often something is true, on average. Spencer is our first ever return guest. So one reference class might be, how many Spencer Greenberg episodes of the 80,000 Hours Podcast have you enjoyed so far? Being this specific limits bias in your answer, but with a sample size of at most 1 - you’d probably want to add more data points to reduce variability. Zooming out, how many episodes of the 80,000 Hours Podcast have you enjoyed? Let’s say you’ve listened to 10, and enjoyed 8 of them. If so 8 out of 10 might be your prior probability. But maybe the two you didn’t enjoy had something in common. If you’ve liked similar episodes in the past, you’d update in favour of expecting to enjoy it, and if you’ve disliked similar episodes in the past, you’d update negatively. You can zoom out further; what fraction of long-form interview podcasts have you ever enjoyed? Then you’d look to update whenever new information became available. Do the topics seem interesting? Did Spencer make a great point in the first 5 minutes? Was this description unbearably self-referential? Speaking of the Question of Evidence: in a world where Spencer was not worth listening to, how likely is it that we’d invite him back for a second episode? Links to learn more, summary and full transcript. We’ll run through several diverse examples, and how to actually work out the changing probabilities as you update. But that’s only a fraction of the conversation. We also discuss: * How could we generate 20-30 new happy thoughts a day? What would that do to our welfare? * What do people actually value? How do EAs differ from non EAs? * Why should we care about the distinction between intrinsic and instrumental values? * Would hedonic utilitarians really want to hook themselves up to happiness machines? * What types of activities are people generally under-confident about? Why? * When should you give a lot of weight to your prior belief? * When should we trust common sense? * Does power posing have any effect? * Are resumes worthless? * Did Trump explicitly collude with Russia? What are the odds of him getting re-elected? * What’s the probability that China and the US go to War in the 21st century? * How should we treat claims of expertise on diets? * Why were Spencer’s friends suspicious of Theranos for years? * How should we think about the placebo effect? * Does a shift towards rationality typically cause alienation from family and friends? How do you deal with that? Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app. Or read the transcript below. The 80,000 Hours podcast is produced by Keiran Harris.

7 Elo 20182h 17min

#38 - Yew-Kwang Ng on anticipating effective altruism decades ago & how to make a much happier world

#38 - Yew-Kwang Ng on anticipating effective altruism decades ago & how to make a much happier world

Will people who think carefully about how to maximize welfare eventually converge on the same views? The effective altruism community has spent a lot of time over the past 10 years debating how best to increase happiness and reduce suffering, and gradually narrowed in on the world’s poorest people, all animals capable of suffering, and future generations. Yew-Kwang Ng, Professor of Economics at Nanyang Technological University in Singapore, was independently working on this exact question since the 70s. Many of his conclusions have ended up foreshadowing what is now conventional wisdom within effective altruism - though other views he holds remain controversial or little-known. For instance, he thinks we ought to explore increasing pleasure via direct brain stimulation, and that genetic engineering may be an important tool for increasing happiness in the future. His work has suggested that the welfare of most wild animals is on balance negative and he thinks that in the future this is a problem humanity might work to solve. Yet he thinks that greatly improved conditions for farm animals could eventually justify eating meat. He has spent most of his life advocating for the view that happiness, broadly construed, is the only intrinsically valuable thing. If it’s true that careful researchers will converge as Prof Ng believes, these ideas may prove as prescient as his other, now widely accepted, opinions. Link to our summary and appreciation of Kwang’s top publications and insights throughout a lifetime of research. Kwang has led an exceptional life. While in high school he was drawn to physics, mathematics, and philosophy, yet he chose to study economics because of his dream: to establish communism in an independent Malaya. But events in the Soviet Union and China, in addition to his burgeoning knowledge and academic appreciation of economics, would change his views about the practicability of communism. He would soon complete his journey from young revolutionary to academic economist, and eventually become a columnist writing in support of Deng Xiaoping’s Chinese economic reforms in the 80s. He got his PhD at Sydney University in 1971, and has since published over 250 refereed papers - covering economics, biology, politics, mathematics, philosophy, psychology, and sociology. He's most well-known for his work in welfare economics, and proposed ‘welfare biology’ as a new field of study. In 2007, he was made a Distinguished Fellow of the Economic Society of Australia, the highest award that the society bestows. Links to learn more, summary and full transcript. In this episode we discuss how he developed some of his most unusual ideas and his fascinating life story, including: * Why Kwang believes that *’Happiness Is Absolute, Universal, Ultimate, Unidimensional, Cardinally Measurable and Interpersonally Comparable’* * What are the most pressing questions in economics? * Did Kwang have to worry about censorship from the Chinese government when promoting market economics, or concern for animal welfare? * Welfare economics and where Kwang thinks it went wrong Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: search for '80,000 Hours' in your podcasting app. The 80,000 Hours Podcast is produced by Keiran Harris.

26 Heinä 20181h 59min

#37 - GiveWell picks top charities by estimating the unknowable. James Snowden on how they do it.

#37 - GiveWell picks top charities by estimating the unknowable. James Snowden on how they do it.

What’s the value of preventing the death of a 5-year-old child, compared to a 20-year-old, or an 80-year-old? The global health community has generally regarded the value as proportional to the number of health-adjusted life-years the person has remaining - but GiveWell, one of the world’s foremost charity evaluators, no longer uses that approach. They found that contrary to the years-remaining’ method, many of their staff actually value preventing the death of an adult more than preventing the death of a young child. However there’s plenty of disagreement: the team’s estimates of the relative value span a four-fold range. As James Snowden - a research consultant at GiveWell - explains in this episode, there’s no way around making these controversial judgement calls based on limited information. If you try to ignore a question like this, you just implicitly take an unreflective stand on it instead. And for each charity they look into there’s 1 or 2 dozen of these highly uncertain parameters they need to estimate. GiveWell has been trying to find better ways to make these decisions since its inception in 2007. Lives hang in the balance, so they want their staff to say what they really believe and bring their private knowledge to the table, rather than just defer to a imaginary consensus. Their strategy is a massive spreadsheet that lists dozens of things they need to estimate, and asking every staff member to give a figure and justification. Then once a year, the GiveWell team get together and try to identify what they really disagree about and think through what evidence it would take to change their minds. Full transcript, summary of the conversation and links to learn more. Often the people who have the greatest familiarity with a particular intervention are the ones who drive the decision, as others defer to them. But the group can also end up with very different figures, based on different prior beliefs about moral issues and how the world works. In that case then use the median of everyone’s best guess to make their key decisions. In making his estimate of the relative badness of dying at different ages, James specifically considered two factors: how many years of life do you lose, and how much interest do you have in those future years? Currently, James believes that the worst time for a person to die is around 8 years of age. We discuss his experiences with such calculations, as well as a range of other topics: * Why GiveWell’s recommendations have changed more than it looks. * What are the biggest research priorities for GiveWell at the moment? * How do you take into account the long-term knock-on effects from interventions? * If GiveWell's advice were going to end up being very different in a couple years' time, how might that happen? * Are there any charities that James thinks are really cost-effective which GiveWell hasn't funded yet? * How does domestic government spending in the developing world compare to effective charities? * What are the main challenges with policy related interventions? * How much time do you spend discovering new interventions? Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: search for '80,000 Hours' in your podcasting app. The 80,000 Hours Podcast is produced by Keiran Harris.

16 Heinä 20181h 44min

#36 - Tanya Singh on ending the operations management bottleneck in effective altruism

#36 - Tanya Singh on ending the operations management bottleneck in effective altruism

Almost nobody is able to do groundbreaking physics research themselves, and by the time his brilliance was appreciated, Einstein was hardly limited by funding. But what if you could find a way to unlock the secrets of the universe like Einstein nonetheless? Today’s guest, Tanya Singh, sees herself as doing something like that every day. She’s Executive Assistant to one of her intellectual heroes who she believes is making a huge contribution to improving the world: Professor Bostrom at Oxford University's Future of Humanity Institute (FHI). She couldn’t get more work out of Bostrom with extra donations, as his salary is already easily covered. But with her superior abilities as an Executive Assistant, Tanya frees up hours of his time every week, essentially ‘buying’ more Bostrom in a way nobody else can. She also help manage FHI more generally, in so doing freeing up more than an hour of other staff time for each hour she works. This gives her the leverage to do more good than other people or other positions. In our previous episode, Tara Mac Aulay objected to viewing operations work as predominately a way of freeing up other people's time: “A good ops person doesn’t just allow you to scale linearly, but also can help figure out bottlenecks and solve problems such that the organization is able to do qualitatively different work, rather than just increase the total quantity”, Tara said. Full transcript, summary and links to learn more. Tara’s right that buying time for people at the top of their field is just one path to impact, though it’s one Tanya says she finds highly motivating. Other paths include enabling complex projects that would otherwise be impossible, allowing you to hire and grow much faster, and preventing disasters that could bring down a whole organisation - all things that Tanya does at FHI as well. In today’s episode we discuss all of those approaches, as we dive deeper into the broad class of roles we refer to as ‘operations management’. We cover the arguments we made in ‘Why operations management is one of the biggest bottlenecks in effective altruism’, as well as: * Does one really need to hire people aligned with an org’s mission to work in ops? * The most notable operations successes in the 20th Century. * What’s it like being the only operations person in an org? * The role of a COO as compared to a CEO, and the options for career progression. * How do good operation teams allow orgs to scale quickly? * How much do operations staff get to set their org’s strategy? * Which personal weaknesses aren’t a huge problem in operations? * How do you automate processes? Why don’t most people do this? * Cultural differences between Britain and India where Tanya grew up. Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app. Or read the transcript below. The 80,000 Hours podcast is produced by Keiran Harris.

11 Heinä 20182h 4min

#35 - Tara Mac Aulay on the audacity to fix the world without asking permission

#35 - Tara Mac Aulay on the audacity to fix the world without asking permission

"You don't need permission. You don't need to be allowed to do something that's not in your job description. If you think that it's gonna make your company or your organization more successful and more efficient, you can often just go and do it." How broken is the world? How inefficient is a typical organisation? Looking at Tara Mac Aulay’s life, the answer seems to be ‘very’. At 15 she took her first job - an entry-level position at a chain restaurant. Rather than accept her place, Tara took it on herself to massively improve the store’s shambolic staff scheduling and inventory management. After cutting staff costs 30% she was quickly promoted, and at 16 sent in to overhaul dozens of failing stores in a final effort to save them from closure. That’s just the first in a startling series of personal stories that take us to a hospital drug dispensary where pharmacists are wasting a third of their time, a chemotherapy ward in Bhutan that’s killing its patients rather than saving lives, and eventually the Centre for Effective Altruism, where Tara becomes CEO and leads it through start-up accelerator Y Combinator. In this episode Tara shows how the ability to do practical things, avoid major screw-ups, and design systems that scale, is both rare and precious. Full transcript, key quotes and links to learn more. People with an operations mindset spot failures others can't see and fix them before they bring an organisation down. This kind of resourcefulness can transform the world by making possible critical projects that would otherwise fall flat on their face. But as Tara's experience shows they need to figure out what actually motivates the authorities who often try to block their reforms. We explore how people with this skillset can do as much good as possible, what 80,000 Hours got wrong in our article 'Why operations management is one of the biggest bottlenecks in effective altruism’, as well as: * Tara’s biggest mistakes and how to deal with the delicate politics of organizational reform. * How a student can save a hospital millions with a simple spreadsheet model. * The sociology of Bhutan and how medicine in the developing world often makes things worse rather than better. * What most people misunderstand about operations, and how to tell if you have what it takes. * And finally, operations jobs people should consider applying for, such as those open now at the Centre for Effective Altruism. Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: search for '80,000 Hours' in your podcasting app. The 80,000 Hours Podcast is produced by Keiran Harris.

21 Kesä 20181h 22min

Rob Wiblin on the art/science of a high impact career

Rob Wiblin on the art/science of a high impact career

Today's episode is a cross-post of an interview I did with The Jolly Swagmen Podcast which came out this week. I recommend regular listeners skip to 24 minutes in to avoid hearing things they already know. Later in the episode I talk about my contrarian views, utilitarianism, how 80,000 Hours has changed and will change in the future, where I think EA is performing worst, how to use social media most effectively, and whether or not effective altruism is any sacrifice. Subscribe and get the episode by searching for '80,000 Hours' in your podcasting app. Blog post of the episode to share, including a list of topics and links to learn more. "Most people want to help others with their career, but what’s the best way to do that? Become a doctor? A politician? Work at a non-profit? How can any of us figure out the best way to use our skills to improve the world? Rob Wiblin is the Director of Research at 80,000 Hours, an organisation founded in Oxford in 2011, which aims to answer just this question and help talented people find their highest-impact career path. He hosts a popular podcast on ‘the world’s most pressing problems and how you can use your career to solve them’. After seven years of research, the 80,000 Hours team recommends against becoming a teacher, or a doctor, or working at most non-profits. And they claim their research shows some common careers do 10 or 100x as much good as others. 80,000 Hours was one of the organisations that kicked off the effective altruism movement, was a Y Combinator-backed non-profit, and has already shifted over 80 million career hours through its advice. Joe caught up with Rob in Berkeley, California, to discuss how 80,000 Hours assesses which of the world’s problems are most pressing, how you can build career capital and succeed in any role, and why you could easily save more lives than a doctor - if you think carefully about your impact." Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: search for '80,000 Hours' in your podcasting app. The 80,000 Hours Podcast is produced by Keiran Harris.

8 Kesä 20181h 31min

#34 - We use the worst voting system that exists. Here's how Aaron Hamlin is going to fix it.

#34 - We use the worst voting system that exists. Here's how Aaron Hamlin is going to fix it.

In 1991 Edwin Edwards won the Louisiana gubernatorial election. In 2001, he was found guilty of racketeering and received a 10 year invitation to Federal prison. The strange thing about that election? By 1991 Edwards was already notorious for his corruption. Actually, that’s not it. The truly strange thing is that Edwards was clearly the good guy in the race. How is that possible? His opponent was former Ku Klux Klan Grand Wizard David Duke. How could Louisiana end up having to choose between a criminal and a Nazi sympathiser? It’s not like they lacked other options: the state’s moderate incumbent governor Buddy Roemer ran for re-election. Polling showed that Roemer was massively preferred to both the career criminal and the career bigot, and would easily win a head-to-head election against either. Unfortunately, in Louisiana every candidate from every party competes in the first round, and the top two then go on to a second - a so-called ‘jungle primary’. Vote splitting squeezed out the middle, and meant that Roemer was eliminated in the first round. Louisiana voters were left with only terrible options, in a run-off election mostly remembered for the proliferation of bumper stickers reading “Vote for the Crook. It’s Important.” We could look at this as a cultural problem, exposing widespread enthusiasm for bribery and racism that will take generations to overcome. But according to Aaron Hamlin, Executive Director of The Center for Election Science (CES), there’s a simple way to make sure we never have to elect someone hated by more than half the electorate: change how we vote. He advocates an alternative voting method called approval voting, in which you can vote for as many candidates as you want, not just one. That means that you can always support your honest favorite candidate, even when an election seems like a choice between the lesser of two evils. Full transcript, links to learn more, and summary of key points. If you'd like to meet Aaron he's doing events for CES in San Francisco, DC, Philadelphia, New York and Brooklyn over the next two weeks - RSVP here. While it might not seem sexy, this single change could transform politics. Approval voting is adored by voting researchers, who regard it as the best simple voting system available. Which do they regard as unquestionably the worst? First-past-the-post - precisely the disastrous system used and exported around the world by the US and UK. Aaron has a practical plan to spread approval voting across the US using ballot initiatives - and it just might be our best shot at making politics a bit less unreasonable. The Center for Election Science is a U.S. non-profit which aims to fix broken government by helping the world adopt smarter election systems. They recently received a $600,000 grant from the Open Philanthropy Project to scale up their efforts. Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: search for '80,000 Hours' in your podcasting app. The 80,000 Hours Podcast is produced by Keiran Harris.

1 Kesä 20182h 18min

#33 - Anders Sandberg on what if we ended ageing, solar flares & the annual risk of nuclear war

#33 - Anders Sandberg on what if we ended ageing, solar flares & the annual risk of nuclear war

Joseph Stalin had a life-extension program dedicated to making himself immortal. What if he had succeeded?  According to our last guest, Bryan Caplan, there’s an 80% chance that Stalin would still be ruling Russia today. Today’s guest disagrees. Like Stalin he has eyes for his own immortality - including an insurance plan that will cover the cost of cryogenically freezing himself after he dies - and thinks the technology to achieve it might be around the corner. Fortunately for humanity though, that guest is probably one of the nicest people on the planet: Dr Anders Sandberg of Oxford University. Full transcript of the conversation, summary, and links to learn more. The potential availability of technology to delay or even stop ageing means this disagreement matters, so he has been trying to model what would really happen if both the very best and the very worst people in the world could live forever - among many other questions. Anders, who studies low-probability high-stakes risks and the impact of technological change at the Future of Humanity Institute, is the first guest to appear twice on the 80,000 Hours Podcast and might just be the most interesting academic at Oxford. His research interests include more or less everything, and bucking the academic trend towards intense specialization has earned him a devoted fan base. ***Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type *80,000 Hours* into your podcasting app.*** Last time we asked him why we don’t see aliens, and how to most efficiently colonise the universe. In today’s episode we ask about Anders’ other recent papers, including: * Is it worth the money to freeze your body after death in the hope of future revival, like Anders has done? * How much is our perception of the risk of nuclear war biased by the fact that we wouldn’t be alive to think about it had one happened? * If biomedical research lets us slow down ageing would culture stagnate under the crushing weight of centenarians? * What long-shot drugs can people take in their 70s to stave off death? * Can science extend human (waking) life by cutting our need to sleep? * How bad would it be if a solar flare took down the electricity grid? Could it happen? * If you’re a scientist and you discover something exciting but dangerous, when should you keep it a secret and when should you share it? * Will lifelike robots make us more inclined to dehumanise one another? Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: search for '80,000 Hours' in your podcasting app. The 80,000 Hours Podcast is produced by Keiran Harris.

29 Touko 20181h 24min

Suosittua kategoriassa Koulutus

rss-murhan-anatomia
psykopodiaa-podcast
voi-hyvin-meditaatiot-2
rss-vegaaneista-tykkaan
aamukahvilla
psykologia
rss-narsisti
rss-valo-minussa-2
adhd-tyylilla
rss-duodecim-lehti
rss-vapaudu-voimaasi
aloita-meditaatio
jari-sarasvuo-podcast
adhd-podi
rss-uskonto-on-tylsaa
rss-koira-haudattuna
queen-talk
dear-ladies
rss-tripsteri
avara-mieli