#132 Classic episode – Nova DasSarma on why information security may be critical to the safe development of AI systems

#132 Classic episode – Nova DasSarma on why information security may be critical to the safe development of AI systems

If a business has spent $100 million developing a product, it’s a fair bet that they don’t want it stolen in two seconds and uploaded to the web where anyone can use it for free.

This problem exists in extreme form for AI companies. These days, the electricity and equipment required to train cutting-edge machine learning models that generate uncanny human text and images can cost tens or hundreds of millions of dollars. But once trained, such models may be only a few gigabytes in size and run just fine on ordinary laptops.

Today’s guest, the computer scientist and polymath Nova DasSarma, works on computer and information security for the AI company Anthropic with the security team. One of her jobs is to stop hackers exfiltrating Anthropic’s incredibly expensive intellectual property, as recently happened to Nvidia.

Rebroadcast: this episode was originally released in June 2022.

Links to learn more, highlights, and full transcript.

As she explains, given models’ small size, the need to store such models on internet-connected servers, and the poor state of computer security in general, this is a serious challenge.

The worries aren’t purely commercial though. This problem looms especially large for the growing number of people who expect that in coming decades we’ll develop so-called artificial ‘general’ intelligence systems that can learn and apply a wide range of skills all at once, and thereby have a transformative effect on society.

If aligned with the goals of their owners, such general AI models could operate like a team of super-skilled assistants, going out and doing whatever wonderful (or malicious) things are asked of them. This might represent a huge leap forward for humanity, though the transition to a very different new economy and power structure would have to be handled delicately.

If unaligned with the goals of their owners or humanity as a whole, such broadly capable models would naturally ‘go rogue,’ breaking their way into additional computer systems to grab more computing power — all the better to pursue their goals and make sure they can’t be shut off.

As Nova explains, in either case, we don’t want such models disseminated all over the world before we’ve confirmed they are deeply safe and law-abiding, and have figured out how to integrate them peacefully into society. In the first scenario, premature mass deployment would be risky and destabilising. In the second scenario, it could be catastrophic — perhaps even leading to human extinction if such general AI systems turn out to be able to self-improve rapidly rather than slowly, something we can only speculate on at this point.

If highly capable general AI systems are coming in the next 10 or 20 years, Nova may be flying below the radar with one of the most important jobs in the world.

We’ll soon need the ability to ‘sandbox’ (i.e. contain) models with a wide range of superhuman capabilities, including the ability to learn new skills, for a period of careful testing and limited deployment — preventing the model from breaking out, and criminals from breaking in. Nova and her colleagues are trying to figure out how to do this, but as this episode reveals, even the state of the art is nowhere near good enough.

Chapters:

  • Cold open (00:00:00)
  • Rob's intro (00:00:52)
  • The interview begins (00:02:44)
  • Why computer security matters for AI safety (00:07:39)
  • State of the art in information security (00:17:21)
  • The hack of Nvidia (00:26:50)
  • The most secure systems that exist (00:36:27)
  • Formal verification (00:48:03)
  • How organisations can protect against hacks (00:54:18)
  • Is ML making security better or worse? (00:58:11)
  • Motivated 14-year-old hackers (01:01:08)
  • Disincentivising actors from attacking in the first place (01:05:48)
  • Hofvarpnir Studios (01:12:40)
  • Capabilities vs safety (01:19:47)
  • Interesting design choices with big ML models (01:28:44)
  • Nova’s work and how she got into it (01:45:21)
  • Anthropic and career advice (02:05:52)
  • $600M Ethereum hack (02:18:37)
  • Personal computer security advice (02:23:06)
  • LastPass (02:31:04)
  • Stuxnet (02:38:07)
  • Rob's outro (02:40:18)

Producer: Keiran Harris
Audio mastering: Ben Cordell and Beppe Rådvik
Transcriptions: Katy Moore

Jaksot(293)

We just put up a new compilation of ten core episodes of the show

We just put up a new compilation of ten core episodes of the show

We recently launched a new podcast feed that might be useful to you and people you know. It's called Effective Altruism: Ten Global Problems, and it's a collection of ten top episodes of this show, selected to help listeners quickly get up to speed on ten pressing problems that the effective altruism community is working to solve. It's a companion to our other compilation Effective Altruism: An Introduction, which explores the big picture debates within the community and how to set priorities in order to have the greatest impact.These ten episodes cover: The cheapest ways to improve education in the developing world How dangerous is climate change and what are the most effective ways to reduce it? Using new technologies to prevent another disastrous pandemic Ways to simultaneously reduce both police misconduct and crime All the major approaches being taken to end factory farming How advances in artificial intelligence could go very right or very wrong Other big threats to the future of humanity — such as a nuclear war — and how can we make our species wiser and more resilient One problem few even recognise as a problem at all The selection is ideal for people who are completely new to the effective altruist way of thinking, as well as those who are familiar with effective altruism but new to The 80,000 Hours Podcast.If someone in your life wants to get an understanding of what 80,000 Hours or effective altruism are all about, and prefers to listen to things rather than read, this is a great resource to direct them to.You can find it by searching for effective altruism in whatever podcasting app you use, or by going to 80000hours.org/ten.We'd love to hear how you go listening to it yourself, or sharing it with others in your life. Get in touch by emailing podcast@80000hours.org.

20 Loka 20213min

#113 – Varsha Venugopal on using gossip to help vaccinate every child in India

#113 – Varsha Venugopal on using gossip to help vaccinate every child in India

Our failure to make sure all kids globally get all of their basic vaccinations leads to 1.5 million child deaths every year.According to today’s guest, Varsha Venugopal, for the great majority this has nothing to do with weird conspiracy theories or medical worries — in India 80% of undervaccinated children are already getting some shots. They just aren't getting all of them, for the tragically mundane reason that life can get in the way.Links to learn more, summary and full transcript. As Varsha says, we're all sometimes guilty of "valuing our present very differently from the way we value the future", leading to short-term thinking whether about getting vaccines or going to the gym. So who should we call on to help fix this universal problem? The government, extended family, or maybe village elders? Varsha says that research shows the most influential figures might actually be local gossips. In 2018, Varsha heard about the ideas around effective altruism for the first time. By the end of 2019, she’d gone through Charity Entrepreneurship’s strategy incubation program, and quit her normal, stable job to co-found Suvita, a non-profit focused on improving the uptake of immunization in India, which focuses on two models: 1. Sending SMS reminders directly to parents and carers 2. Gossip The first one is intuitive. You collect birth registers, digitize the paper records, process the data, and send out personalised SMS messages to hundreds of thousands of families. The effect size varies depending on the context but these messages usually increase vaccination rates by 8-18%. The second approach is less intuitive and isn't yet entirely understood either. Here’s what happens: Suvita calls up random households and asks, “if there were an event in town, who would be most likely to tell you about it?” In over 90% of the cases, the households gave both the name and the phone number of a local ‘influencer’. And when tracked down, more than 95% of the most frequently named 'influencers' agreed to become vaccination ambassadors. Those ambassadors then go on to share information about when and where to get vaccinations, in whatever way seems best to them. When tested by a team of top academics at the Poverty Action Lab (J-PAL) it raised vaccination rates by 10 percentage points, or about 27%. The advantage of SMS reminders is that they’re easier to scale up. But Varsha says the ambassador program isn’t actually that far from being a scalable model as well. A phone call to get a name, another call to ask the influencer join, and boom — you might have just covered a whole village rather than just a single family. Varsha says that Suvita has two major challenges on the horizon: 1. Maintaining the same degree of oversight of their surveyors as they attempt to scale up the program, in order to ensure the program continues to work just as well 2. Deciding between focusing on reaching a few more additional districts now vs. making longer term investments which could build up to a future exponential increase. In this episode, Varsha and Rob talk about making these kinds of high-stakes, high-stress decisions, as well as: • How Suvita got started, and their experience with Charity Entrepreneurship • Weaknesses of the J-PAL studies • The importance of co-founders • Deciding how broad a program should be • Varsha’s day-to-day experience • And much moreChapters:Rob’s intro (00:00:00)The interview begins (00:01:47)The problem of undervaccinated kids (00:03:16)Suvita (00:12:47)Evidence on SMS reminders (00:20:30)Gossip intervention (00:28:43)Why parents aren’t already prioritizing vaccinations (00:38:29)Weaknesses of studies (00:43:01)Biggest challenges for Suvita (00:46:05)Staff location (01:06:57)Charity Entrepreneurship (01:14:37)The importance of co-founders (01:23:23)Deciding how broad a program should be (01:28:29)Careers at Suvita (01:34:11)Varsha’s advice (01:42:30)Varsha’s day-to-day experience (01:56:19)Producer: Keiran HarrisAudio mastering: Ben CordellTranscriptions: Katy Moore

18 Loka 20212h 5min

#112 – Carl Shulman on the common-sense case for existential risk work and its practical implications

#112 – Carl Shulman on the common-sense case for existential risk work and its practical implications

Preventing the apocalypse may sound like an idiosyncratic activity, and it sometimes is justified on exotic grounds, such as the potential for humanity to become a galaxy-spanning civilisation.But the policy of US government agencies is already to spend up to $4 million to save the life of a citizen, making the death of all Americans a $1,300,000,000,000,000 disaster.According to Carl Shulman, research associate at Oxford University's Future of Humanity Institute, that means you don’t need any fancy philosophical arguments about the value or size of the future to justify working to reduce existential risk — it passes a mundane cost-benefit analysis whether or not you place any value on the long-term future.Links to learn more, summary and full transcript. The key reason to make it a top priority is factual, not philosophical. That is, the risk of a disaster that kills billions of people alive today is alarmingly high, and it can be reduced at a reasonable cost. A back-of-the-envelope version of the argument runs: • The US government is willing to pay up to $4 million (depending on the agency) to save the life of an American. • So saving all US citizens at any given point in time would be worth $1,300 trillion. • If you believe that the risk of human extinction over the next century is something like one in six (as Toby Ord suggests is a reasonable figure in his book The Precipice), then it would be worth the US government spending up to $2.2 trillion to reduce that risk by just 1%, in terms of American lives saved alone. • Carl thinks it would cost a lot less than that to achieve a 1% risk reduction if the money were spent intelligently. So it easily passes a government cost-benefit test, with a very big benefit-to-cost ratio — likely over 1000:1 today. This argument helped NASA get funding to scan the sky for any asteroids that might be on a collision course with Earth, and it was directly promoted by famous economists like Richard Posner, Larry Summers, and Cass Sunstein. If the case is clear enough, why hasn't it already motivated a lot more spending or regulations to limit existential risks — enough to drive down what any additional efforts would achieve? Carl thinks that one key barrier is that infrequent disasters are rarely politically salient. Research indicates that extra money is spent on flood defences in the years immediately following a massive flood — but as memories fade, that spending quickly dries up. Of course the annual probability of a disaster was the same the whole time; all that changed is what voters had on their minds. Carl expects that all the reasons we didn’t adequately prepare for or respond to COVID-19 — with excess mortality over 15 million and costs well over $10 trillion — bite even harder when it comes to threats we've never faced before, such as engineered pandemics, risks from advanced artificial intelligence, and so on. Today’s episode is in part our way of trying to improve this situation. In today’s wide-ranging conversation, Carl and Rob also cover: • A few reasons Carl isn't excited by 'strong longtermism' • How x-risk reduction compares to GiveWell recommendations • Solutions for asteroids, comets, supervolcanoes, nuclear war, pandemics, and climate change • The history of bioweapons • Whether gain-of-function research is justifiable • Successes and failures around COVID-19 • The history of existential risk • And much moreChapters:Rob’s intro (00:00:00)The interview begins (00:01:34)A few reasons Carl isn't excited by strong longtermism (00:03:47)Longtermism isn’t necessary for wanting to reduce big x-risks (00:08:21)Why we don’t adequately prepare for disasters (00:11:16)International programs to stop asteroids and comets (00:18:55)Costs and political incentives around COVID (00:23:52)How x-risk reduction compares to GiveWell recommendations (00:34:34)Solutions for asteroids, comets, and supervolcanoes (00:50:22)Solutions for climate change (00:54:15)Solutions for nuclear weapons (01:02:18)The history of bioweapons (01:22:41)Gain-of-function research (01:34:22)Solutions for bioweapons and natural pandemics (01:45:31)Successes and failures around COVID-19 (01:58:26)Who to trust going forward (02:09:09)The history of existential risk (02:15:07)The most compelling risks (02:24:59)False alarms about big risks in the past (02:34:22)Suspicious convergence around x-risk reduction (02:49:31)How hard it would be to convince governments (02:57:59)Defensive epistemology (03:04:34)Hinge of history debate (03:16:01)Technological progress can’t keep up for long (03:21:51)Strongest argument against this being a really pivotal time (03:37:29)How Carl unwinds (03:45:30)Producer: Keiran HarrisAudio mastering: Ben CordellTranscriptions: Katy Moore

5 Loka 20213h 48min

#111 – Mushtaq Khan on using institutional economics to predict effective government reforms

#111 – Mushtaq Khan on using institutional economics to predict effective government reforms

If you’re living in the Niger Delta in Nigeria, your best bet at a high-paying career is probably ‘artisanal refining’ — or, in plain language, stealing oil from pipelines. The resulting oil spills damage the environment and cause severe health problems, but the Nigerian government has continually failed in their attempts to stop this theft. They send in the army, and the army gets corrupted. They send in enforcement agencies, and the enforcement agencies get corrupted. What’s happening here? According to Mushtaq Khan, economics professor at SOAS University of London, this is a classic example of ‘networked corruption’. Everyone in the community is benefiting from the criminal enterprise — so much so that the locals would prefer civil war to following the law. It pays vastly better than other local jobs, hotels and restaurants have formed around it, and houses are even powered by the electricity generated from the oil. Links to learn more, summary and full transcript. In today's episode, Mushtaq elaborates on the models he uses to understand these problems and make predictions he can test in the real world. Some of the most important factors shaping the fate of nations are their structures of power: who is powerful, how they are organized, which interest groups can pull in favours with the government, and the constant push and pull between the country's rulers and its ruled. While traditional economic theory has relatively little to say about these topics, institutional economists like Mushtaq have a lot to say, and participate in lively debates about which of their competing ideas best explain the world around us. The issues at stake are nothing less than why some countries are rich and others are poor, why some countries are mostly law abiding while others are not, and why some government programmes improve public welfare while others just enrich the well connected. Mushtaq’s specialties are anti-corruption and industrial policy, where he believes mainstream theory and practice are largely misguided. Mushtaq's rule of thumb is that when the locals most concerned with a specific issue are invested in preserving a status quo they're participating in, they almost always win out. To actually reduce corruption, countries like his native Bangladesh have to follow the same gradual path the U.K. once did: find organizations that benefit from rule-abiding behaviour and are selfishly motivated to promote it, and help them police their peers. Trying to impose a new way of doing things from the top down wasn't how Europe modernised, and it won't work elsewhere either. In cases like oil theft in Nigeria, where no one wants to follow the rules, Mushtaq says corruption may be impossible to solve directly. Instead you have to play a long game, bringing in other employment opportunities, improving health services, and deploying alternative forms of energy — in the hope that one day this will give people a viable alternative to corruption. In this extensive interview Rob and Mushtaq cover this and much more, including: • How does one test theories like this? • Why are companies in some poor countries so much less productive than their peers in rich countries? • Have rich countries just legalized the corruption in their societies? • What are the big live debates in institutional economics? • Should poor countries protect their industries from foreign competition? • How can listeners use these theories to predict which policies will work in their own countries? Chapters:Rob’s intro (00:00:00)The interview begins (00:01:55)Institutional economics (00:15:37)Anti-corruption policies (00:28:45)Capabilities (00:34:51)Why the market doesn’t solve the problem (00:42:29)Industrial policy (00:46:11)South Korea (01:01:31)Chiang Kai-shek (01:16:01)The logic of political survival (01:18:43)Anti-corruption as a design of your policy (01:35:16)Examples of anti-corruption programs with good prospects (01:45:17)The importance of getting overseas influences (01:56:05)Actually capturing the primary effect (02:03:26)How less developed countries could successfully design subsidies (02:15:14)What happens when horizontal policing isn't possible (02:26:34)Rule of law <--> economic development (02:33:40)Violence (02:38:31)How this applies to developed countries (02:48:57)Policies to help left-behind groups (02:55:39)What to study (02:58:50) Producer: Keiran Harris Audio mastering: Ben Cordell Transcriptions: Sofia Davis-Fogel

10 Syys 20213h 20min

#110 – Holden Karnofsky on building aptitudes and kicking ass

#110 – Holden Karnofsky on building aptitudes and kicking ass

Holden Karnofsky helped create two of the most influential organisations in the effective philanthropy world. So when he outlines a different perspective on career advice than the one we present at 80,000 Hours — we take it seriously.Holden disagrees with us on a few specifics, but it's more than that: he prefers a different vibe when making career choices, especially early in one's career.Links to learn more, summary and full transcript. While he might ultimately recommend similar jobs to those we recommend at 80,000 Hours, the reasons are often different. At 80,000 Hours we often talk about ‘paths’ to working on what we currently think of as the most pressing problems in the world. That’s partially because people seem to prefer the most concrete advice possible. But Holden thinks a problem with that kind of advice is that it’s hard to take actions based on it if your job options don’t match well with your plan, and it’s hard to get a reliable signal about whether you're making the right choices. How can you know you’ve chosen the right cause? How can you know the job you’re aiming for will be helpful to that cause? And what if you can’t get a job in this area at all? Holden prefers to focus on ‘aptitudes’ that you can build in all sorts of different roles and cause areas, which can later be applied more directly. Even if the current role doesn’t work out, or your career goes in wacky directions you’d never anticipated (like so many successful careers do), or you change your whole worldview — you’ll still have access to this aptitude. So instead of trying to become a project manager at an effective altruism organisation, maybe you should just become great at project management. Instead of trying to become a researcher at a top AI lab, maybe you should just become great at digesting hard problems. Who knows where these skills will end up being useful down the road? Holden doesn’t think you should spend much time worrying about whether you’re having an impact in the first few years of your career — instead you should just focus on learning to kick ass at something, knowing that most of your impact is going to come decades into your career. He thinks as long as you’ve gotten good at something, there will usually be a lot of ways that you can contribute to solving the biggest problems. But Holden’s most important point, perhaps, is this: Be very careful about following career advice at all. He points out that a career is such a personal thing that it’s very easy for the advice-giver to be oblivious to important factors having to do with your personality and unique situation. He thinks it’s pretty hard for anyone to really have justified empirical beliefs about career choice, and that you should be very hesitant to make a radically different decision than you would have otherwise based on what some person (or website!) tells you to do. Instead, he hopes conversations like these serve as a way of prompting discussion and raising points that you can apply your own personal judgment to. That's why in the end he thinks people should look at their career decisions through his aptitude lens, the '80,000 Hours lens', and ideally several other frameworks as well. Because any one perspective risks missing something important. Holden and Rob also cover: • Ways to be helpful to longtermism outside of careers • Why finding a new cause area might be overrated • Historical events that deserve more attention • And much more Chapters:Rob’s intro (00:00:00)Holden’s current impressions on career choice for longtermists (00:02:34)Aptitude-first vs. career path-first approaches (00:08:46)How to tell if you’re on track (00:16:24)Just try to kick ass in whatever (00:26:00)When not to take the thing you're excited about (00:36:54)Ways to be helpful to longtermism outside of careers (00:41:36)Things 80,000 Hours might be doing wrong (00:44:31)The state of longtermism (00:51:50)Money pits (01:02:10)Broad longtermism (01:06:56)Cause X (01:21:33)Open Philanthropy (01:24:23)COVID and the biorisk portfolio (01:35:09)Has the world gotten better? (01:51:16)Historical events that deserve more attention (01:55:11)Applied epistemology (02:10:55)What Holden has learned from COVID (02:20:55)What Holden has gotten wrong recently (02:32:59)Having a kid (02:39:50)Producer: Keiran HarrisAudio mastering: Ben CordellTranscriptions: Sofia Davis-Fogel

26 Elo 20212h 46min

#109 – Holden Karnofsky on the most important century

#109 – Holden Karnofsky on the most important century

Will the future of humanity be wild, or boring? It's natural to think that if we're trying to be sober and measured, and predict what will really happen rather than spin an exciting story, it's more likely than not to be sort of... dull. But there's also good reason to think that that is simply impossible. The idea that there's a boring future that's internally coherent is an illusion that comes from not inspecting those scenarios too closely. At least that is what Holden Karnofsky — founder of charity evaluator GiveWell and foundation Open Philanthropy — argues in his new article series titled 'The Most Important Century'. He hopes to lay out part of the worldview that's driving the strategy and grantmaking of Open Philanthropy's longtermist team, and encourage more people to join his efforts to positively shape humanity's future. Links to learn more, summary and full transcript. The bind is this. For the first 99% of human history the global economy (initially mostly food production) grew very slowly: under 0.1% a year. But since the industrial revolution around 1800, growth has exploded to over 2% a year. To us in 2020 that sounds perfectly sensible and the natural order of things. But Holden points out that in fact it's not only unprecedented, it also can't continue for long. The power of compounding increases means that to sustain 2% growth for just 10,000 years, 5% as long as humanity has already existed, would require us to turn every individual atom in the galaxy into an economy as large as the Earth's today. Not super likely. So what are the options? First, maybe growth will slow and then stop. In that case we today live in the single miniscule slice in the history of life during which the world rapidly changed due to constant technological advances, before intelligent civilization permanently stagnated or even collapsed. What a wild time to be alive! Alternatively, maybe growth will continue for thousands of years. In that case we are at the very beginning of what would necessarily have to become a stable galaxy-spanning civilization, harnessing the energy of entire stars among other feats of engineering. We would then stand among the first tiny sliver of all the quadrillions of intelligent beings who ever exist. What a wild time to be alive! Isn't there another option where the future feels less remarkable and our current moment not so special? While the full version of the argument above has a number of caveats, the short answer is 'not really'. We might be in a computer simulation and our galactic potential all an illusion, though that's hardly any less weird. And maybe the most exciting events won't happen for generations yet. But on a cosmic scale we'd still be living around the universe's most remarkable time. Holden himself was very reluctant to buy into the idea that today’s civilization is in a strange and privileged position, but has ultimately concluded "all possible views about humanity's future are wild". In the conversation Holden and Rob cover each part of the 'Most Important Century' series, including: • The case that we live in an incredibly important time • How achievable-seeming technology - in particular, mind uploading - could lead to unprecedented productivity, control of the environment, and more • How economic growth is faster than it can be for all that much longer • Forecasting transformative AI • And the implications of living in the most important century Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app. Producer: Keiran Harris Audio mastering: Ben Cordell Transcriptions: Sofia Davis-Fogel

19 Elo 20212h 19min

#108 – Chris Olah on working at top AI labs without an undergrad degree

#108 – Chris Olah on working at top AI labs without an undergrad degree

Chris Olah has had a fascinating and unconventional career path. Most people who want to pursue a research career feel they need a degree to get taken seriously. But Chris not only doesn't have a PhD, but doesn’t even have an undergraduate degree. After dropping out of university to help defend an acquaintance who was facing bogus criminal charges, Chris started independently working on machine learning research, and eventually got an internship at Google Brain, a leading AI research group. In this interview — a follow-up to our episode on his technical work — we discuss what, if anything, can be learned from his unusual career path. Should more people pass on university and just throw themselves at solving a problem they care about? Or would it be foolhardy for others to try to copy a unique case like Chris’? Links to learn more, summary and full transcript. We also cover some of Chris' personal passions over the years, including his attempts to reduce what he calls 'research debt' by starting a new academic journal called Distill, focused just on explaining existing results unusually clearly. As Chris explains, as fields develop they accumulate huge bodies of knowledge that researchers are meant to be familiar with before they start contributing themselves. But the weight of that existing knowledge — and the need to keep up with what everyone else is doing — can become crushing. It can take someone until their 30s or later to earn their stripes, and sometimes a field will split in two just to make it possible for anyone to stay on top of it. If that were unavoidable it would be one thing, but Chris thinks we're nowhere near communicating existing knowledge as well as we could. Incrementally improving an explanation of a technical idea might take a single author weeks to do, but could go on to save a day for thousands, tens of thousands, or hundreds of thousands of students, if it becomes the best option available. Despite that, academics have little incentive to produce outstanding explanations of complex ideas that can speed up the education of everyone coming up in their field. And some even see the process of deciphering bad explanations as a desirable right of passage all should pass through, just as they did. So Chris tried his hand at chipping away at this problem — but concluded the nature of the problem wasn't quite what he originally thought. In this conversation we talk about that, as well as: • Why highly thoughtful cold emails can be surprisingly effective, but average cold emails do little • Strategies for growing as a researcher • Thinking about research as a market • How Chris thinks about writing outstanding explanations • The concept of 'micromarriages' and ‘microbestfriendships’ • And much more. Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app. Producer: Keiran Harris Audio mastering: Ben Cordell Transcriptions: Sofia Davis-Fogel

11 Elo 20211h 33min

#107 – Chris Olah on what the hell is going on inside neural networks

#107 – Chris Olah on what the hell is going on inside neural networks

Big machine learning models can identify plant species better than any human, write passable essays, beat you at a game of Starcraft 2, figure out how a photo of Tobey Maguire and the word 'spider' are related, solve the 60-year-old 'protein folding problem', diagnose some diseases, play romantic matchmaker, write solid computer code, and offer questionable legal advice. Humanity made these amazing and ever-improving tools. So how do our creations work? In short: we don't know. Today's guest, Chris Olah, finds this both absurd and unacceptable. Over the last ten years he has been a leader in the effort to unravel what's really going on inside these black boxes. As part of that effort he helped create the famous DeepDream visualisations at Google Brain, reverse engineered the CLIP image classifier at OpenAI, and is now continuing his work at Anthropic, a new $100 million research company that tries to "co-develop the latest safety techniques alongside scaling of large ML models". Links to learn more, summary and full transcript. Despite having a huge fan base thanks to his explanations of ML and tweets, today's episode is the first long interview Chris has ever given. It features his personal take on what we've learned so far about what ML algorithms are doing, and what's next for this research agenda at Anthropic. His decade of work has borne substantial fruit, producing an approach for looking inside the mess of connections in a neural network and back out what functional role each piece is serving. Among other things, Chris and team found that every visual classifier seems to converge on a number of simple common elements in their early layers — elements so fundamental they may exist in our own visual cortex in some form. They also found networks developing 'multimodal neurons' that would trigger in response to the presence of high-level concepts like 'romance', across both images and text, mimicking the famous 'Halle Berry neuron' from human neuroscience. While reverse engineering how a mind works would make any top-ten list of the most valuable knowledge to pursue for its own sake, Chris's work is also of urgent practical importance. Machine learning models are already being deployed in medicine, business, the military, and the justice system, in ever more powerful roles. The competitive pressure to put them into action as soon as they can turn a profit is great, and only getting greater. But if we don't know what these machines are doing, we can't be confident they'll continue to work the way we want as circumstances change. Before we hand an algorithm the proverbial nuclear codes, we should demand more assurance than "well, it's always worked fine so far". But by peering inside neural networks and figuring out how to 'read their minds' we can potentially foresee future failures and prevent them before they happen. Artificial neural networks may even be a better way to study how our own minds work, given that, unlike a human brain, we can see everything that's happening inside them — and having been posed similar challenges, there's every reason to think evolution and 'gradient descent' often converge on similar solutions. Among other things, Rob and Chris cover: • Why Chris thinks it's necessary to work with the largest models • What fundamental lessons we've learned about how neural networks (and perhaps humans) think • How interpretability research might help make AI safer to deploy, and Chris’ response to skeptics • Why there's such a fuss about 'scaling laws' and what they say about future AI progress Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app. Producer: Keiran Harris Audio mastering: Ben Cordell Transcriptions: Sofia Davis-Fogel

4 Elo 20213h 9min

Suosittua kategoriassa Koulutus

rss-murhan-anatomia
voi-hyvin-meditaatiot-2
psykopodiaa-podcast
aamukahvilla
jari-sarasvuo-podcast
adhd-podi
rss-lasnaolon-hetkia-mindfulness-tutuksi
rss-vegaaneista-tykkaan
rss-duodecim-lehti
kehossa
rss-valo-minussa-2
rss-narsisti
psykologiaa-ja-kaikenlaista
rss-vapaudu-voimaasi
mielipaivakirja
ihminen-tavattavissa-tommy-hellsten-instituutti
rss-elamankoulu
rss-tyoelaman-timantteja
rss-luonnollinen-synnytys-podcast
rss-eron-alkemiaa