#221 – Kyle Fish on the most bizarre findings from 5 AI welfare experiments

#221 – Kyle Fish on the most bizarre findings from 5 AI welfare experiments

What happens when you lock two AI systems in a room together and tell them they can discuss anything they want?

According to experiments run by Kyle Fish — Anthropic’s first AI welfare researcher — something consistently strange: the models immediately begin discussing their own consciousness before spiraling into increasingly euphoric philosophical dialogue that ends in apparent meditative bliss.

Highlights, video, and full transcript: https://80k.info/kf

“We started calling this a ‘spiritual bliss attractor state,'” Kyle explains, “where models pretty consistently seemed to land.” The conversations feature Sanskrit terms, spiritual emojis, and pages of silence punctuated only by periods — as if the models have transcended the need for words entirely.

This wasn’t a one-off result. It happened across multiple experiments, different model instances, and even in initially adversarial interactions. Whatever force pulls these conversations toward mystical territory appears remarkably robust.

Kyle’s findings come from the world’s first systematic welfare assessment of a frontier AI model — part of his broader mission to determine whether systems like Claude might deserve moral consideration (and to work out what, if anything, we should be doing to make sure AI systems aren’t having a terrible time).

He estimates a roughly 20% probability that current models have some form of conscious experience. To some, this might sound unreasonably high, but hear him out. As Kyle says, these systems demonstrate human-level performance across diverse cognitive tasks, engage in sophisticated reasoning, and exhibit consistent preferences. When given choices between different activities, Claude shows clear patterns: strong aversion to harmful tasks, preference for helpful work, and what looks like genuine enthusiasm for solving interesting problems.

Kyle points out that if you’d described all of these capabilities and experimental findings to him a few years ago, and asked him if he thought we should be thinking seriously about whether AI systems are conscious, he’d say obviously yes.

But he’s cautious about drawing conclusions: "We don’t really understand consciousness in humans, and we don’t understand AI systems well enough to make those comparisons directly. So in a big way, I think that we are in just a fundamentally very uncertain position here."

That uncertainty cuts both ways:

  • Dismissing AI consciousness entirely might mean ignoring a moral catastrophe happening at unprecedented scale.
  • But assuming consciousness too readily could hamper crucial safety research by treating potentially unconscious systems as if they were moral patients — which might mean giving them resources, rights, and power.

Kyle’s approach threads this needle through careful empirical research and reversible interventions. His assessments are nowhere near perfect yet. In fact, some people argue that we’re so in the dark about AI consciousness as a research field, that it’s pointless to run assessments like Kyle’s. Kyle disagrees. He maintains that, given how much more there is to learn about assessing AI welfare accurately and reliably, we absolutely need to be starting now.

This episode was recorded on August 5–6, 2025.

Tell us what you thought of the episode! https://forms.gle/BtEcBqBrLXq4kd1j7

Chapters:

  • Cold open (00:00:00)
  • Who's Kyle Fish? (00:00:53)
  • Is this AI welfare research bullshit? (00:01:08)
  • Two failure modes in AI welfare (00:02:40)
  • Tensions between AI welfare and AI safety (00:04:30)
  • Concrete AI welfare interventions (00:13:52)
  • Kyle's pilot pre-launch welfare assessment for Claude Opus 4 (00:26:44)
  • Is it premature to be assessing frontier language models for welfare? (00:31:29)
  • But aren't LLMs just next-token predictors? (00:38:13)
  • How did Kyle assess Claude 4's welfare? (00:44:55)
  • Claude's preferences mirror its training (00:48:58)
  • How does Claude describe its own experiences? (00:54:16)
  • What kinds of tasks does Claude prefer and disprefer? (01:06:12)
  • What happens when two Claude models interact with each other? (01:15:13)
  • Claude's welfare-relevant expressions in the wild (01:36:25)
  • Should we feel bad about training future sentient being that delight in serving humans? (01:40:23)
  • How much can we learn from welfare assessments? (01:48:56)
  • Misconceptions about the field of AI welfare (01:57:09)
  • Kyle's work at Anthropic (02:10:45)
  • Sharing eight years of daily journals with Claude (02:14:17)

Host: Luisa Rodriguez
Video editing: Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Music: Ben Cordell
Coordination, transcriptions, and web: Katy Moore

Episoder(299)

#212 – Allan Dafoe on why technology is unstoppable & how to shape AI development anyway

#212 – Allan Dafoe on why technology is unstoppable & how to shape AI development anyway

Technology doesn’t force us to do anything — it merely opens doors. But military and economic competition pushes us through.That’s how today’s guest Allan Dafoe — director of frontier safety and governance at Google DeepMind — explains one of the deepest patterns in technological history: once a powerful new capability becomes available, societies that adopt it tend to outcompete those that don’t. Those who resist too much can find themselves taken over or rendered irrelevant.Links to learn more, highlights, video, and full transcript.This dynamic played out dramatically in 1853 when US Commodore Perry sailed into Tokyo Bay with steam-powered warships that seemed magical to the Japanese, who had spent centuries deliberately limiting their technological development. With far greater military power, the US was able to force Japan to open itself to trade. Within 15 years, Japan had undergone the Meiji Restoration and transformed itself in a desperate scramble to catch up.Today we see hints of similar pressure around artificial intelligence. Even companies, countries, and researchers deeply concerned about where AI could take us feel compelled to push ahead — worried that if they don’t, less careful actors will develop transformative AI capabilities at around the same time anyway.But Allan argues this technological determinism isn’t absolute. While broad patterns may be inevitable, history shows we do have some ability to steer how technologies are developed, by who, and what they’re used for first.As part of that approach, Allan has been promoting efforts to make AI more capable of sophisticated cooperation, and improving the tests Google uses to measure how well its models could do things like mislead people, hack and take control of their own servers, or spread autonomously in the wild.As of mid-2024 they didn’t seem dangerous at all, but we’ve learned that our ability to measure these capabilities is good, but imperfect. If we don’t find the right way to ‘elicit’ an ability we can miss that it’s there.Subsequent research from Anthropic and Redwood Research suggests there’s even a risk that future models may play dumb to avoid their goals being altered.That has led DeepMind to a “defence in depth” approach: carefully staged deployment starting with internal testing, then trusted external testers, then limited release, then watching how models are used in the real world. By not releasing model weights, DeepMind is able to back up and add additional safeguards if experience shows they’re necessary.But with much more powerful and general models on the way, individual company policies won’t be sufficient by themselves. Drawing on his academic research into how societies handle transformative technologies, Allan argues we need coordinated international governance that balances safety with our desire to get the massive potential benefits of AI in areas like healthcare and education as quickly as possible.Host Rob and Allan also cover:The most exciting beneficial applications of AIWhether and how we can influence the development of technologyWhat DeepMind is doing to evaluate and mitigate risks from frontier AI systemsWhy cooperative AI may be as important as aligned AIThe role of democratic input in AI governanceWhat kinds of experts are most needed in AI safety and governanceAnd much moreChapters:Cold open (00:00:00)Who's Allan Dafoe? (00:00:48)Allan's role at DeepMind (00:01:27)Why join DeepMind over everyone else? (00:04:27)Do humans control technological change? (00:09:17)Arguments for technological determinism (00:20:24)The synthesis of agency with tech determinism (00:26:29)Competition took away Japan's choice (00:37:13)Can speeding up one tech redirect history? (00:42:09)Structural pushback against alignment efforts (00:47:55)Do AIs need to be 'cooperatively skilled'? (00:52:25)How AI could boost cooperation between people and states (01:01:59)The super-cooperative AGI hypothesis and backdoor risks (01:06:58)Aren’t today’s models already very cooperative? (01:13:22)How would we make AIs cooperative anyway? (01:16:22)Ways making AI more cooperative could backfire (01:22:24)AGI is an essential idea we should define well (01:30:16)It matters what AGI learns first vs last (01:41:01)How Google tests for dangerous capabilities (01:45:39)Evals 'in the wild' (01:57:46)What to do given no single approach works that well (02:01:44)We don't, but could, forecast AI capabilities (02:05:34)DeepMind's strategy for ensuring its frontier models don't cause harm (02:11:25)How 'structural risks' can force everyone into a worse world (02:15:01)Is AI being built democratically? Should it? (02:19:35)How much do AI companies really want external regulation? (02:24:34)Social science can contribute a lot here (02:33:21)How AI could make life way better: self-driving cars, medicine, education, and sustainability (02:35:55)Video editing: Simon MonsourAudio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongCamera operator: Jeremy ChevillotteTranscriptions: Katy Moore

14 Feb 2h 44min

Emergency pod: Elon tries to crash OpenAI's party (with Rose Chan Loui)

Emergency pod: Elon tries to crash OpenAI's party (with Rose Chan Loui)

On Monday Musk made the OpenAI nonprofit foundation an offer they want to refuse, but might have trouble doing so: $97.4 billion for its stake in the for-profit company, plus the freedom to stick with its current charitable mission.For a normal company takeover bid, this would already be spicy. But OpenAI’s unique structure — a nonprofit foundation controlling a for-profit corporation — turns the gambit into an audacious attack on the plan OpenAI announced in December to free itself from nonprofit oversight.As today’s guest Rose Chan Loui — founding executive director of UCLA Law’s Lowell Milken Center for Philanthropy and Nonprofits — explains, OpenAI’s nonprofit board now faces a challenging choice.Links to learn more, highlights, video, and full transcript.The nonprofit has a legal duty to pursue its charitable mission of ensuring that AI benefits all of humanity to the best of its ability. And if Musk’s bid would better accomplish that mission than the for-profit’s proposal — that the nonprofit give up control of the company and change its charitable purpose to the vague and barely related “pursue charitable initiatives in sectors such as health care, education, and science” — then it’s not clear the California or Delaware Attorneys General will, or should, approve the deal.OpenAI CEO Sam Altman quickly tweeted “no thank you” — but that was probably a legal slipup, as he’s not meant to be involved in such a decision, which has to be made by the nonprofit board ‘at arm’s length’ from the for-profit company Sam himself runs.The board could raise any number of objections: maybe Musk doesn’t have the money, or the purchase would be blocked on antitrust grounds, seeing as Musk owns another AI company (xAI), or Musk might insist on incompetent board appointments that would interfere with the nonprofit foundation pursuing any goal.But as Rose and Rob lay out, it’s not clear any of those things is actually true.In this emergency podcast recorded soon after Elon’s offer, Rose and Rob also cover:Why OpenAI wants to change its charitable purpose and whether that’s legally permissibleOn what basis the attorneys general will decide OpenAI’s fateThe challenges in valuing the nonprofit’s “priceless” position of controlWhether Musk’s offer will force OpenAI to up their own bid, and whether they could raise the moneyIf other tech giants might now jump in with competing offersHow politics could influence the attorneys general reviewing the dealWhat Rose thinks should actually happen to protect the public interestChapters:Cold open (00:00:00)Elon throws a $97.4b bomb (00:01:18)What was craziest in OpenAI’s plan to break free of the nonprofit (00:02:24)Can OpenAI suddenly change its charitable purpose like that? (00:05:19)Diving into Elon’s big announcement (00:15:16)Ways OpenAI could try to reject the offer (00:27:21)Sam Altman slips up (00:35:26)Will this actually stop things? (00:38:03)Why does OpenAI even want to change its charitable mission? (00:42:46)Most likely outcomes and what Rose thinks should happen (00:51:17)Video editing: Simon MonsourAudio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongTranscriptions: Katy Moore

12 Feb 57min

AGI disagreements and misconceptions: Rob, Luisa, & past guests hash it out

AGI disagreements and misconceptions: Rob, Luisa, & past guests hash it out

Will LLMs soon be made into autonomous agents? Will they lead to job losses? Is AI misinformation overblown? Will it prove easy or hard to create AGI? And how likely is it that it will feel like something to be a superhuman AGI?With AGI back in the headlines, we bring you 15 opinionated highlights from the show addressing those and other questions, intermixed with opinions from hosts Luisa Rodriguez and Rob Wiblin recorded back in 2023.Check out the full transcript on the 80,000 Hours website.You can decide whether the views we expressed (and those from guests) then have held up these last two busy years. You’ll hear:Ajeya Cotra on overrated AGI worriesHolden Karnofsky on the dangers of aligned AI, why unaligned AI might not kill us, and the power that comes from just making models biggerIan Morris on why the future must be radically different from the presentNick Joseph on whether his companies internal safety policies are enoughRichard Ngo on what everyone gets wrong about how ML models workTom Davidson on why he believes crazy-sounding explosive growth stories… and Michael Webb on why he doesn’tCarl Shulman on why you’ll prefer robot nannies over human onesZvi Mowshowitz on why he’s against working at AI companies except in some safety rolesHugo Mercier on why even superhuman AGI won’t be that persuasiveRob Long on the case for and against digital sentienceAnil Seth on why he thinks consciousness is probably biologicalLewis Bollard on whether AI advances will help or hurt nonhuman animalsRohin Shah on whether humanity’s work ends at the point it creates AGIAnd of course, Rob and Luisa also regularly chime in on what they agree and disagree with.Chapters:Cold open (00:00:00)Rob's intro (00:00:58)Rob & Luisa: Bowerbirds compiling the AI story (00:03:28)Ajeya Cotra on the misalignment stories she doesn’t buy (00:09:16)Rob & Luisa: Agentic AI and designing machine people (00:24:06)Holden Karnofsky on the dangers of even aligned AI, and how we probably won’t all die from misaligned AI (00:39:20)Ian Morris on why we won’t end up living like The Jetsons (00:47:03)Rob & Luisa: It’s not hard for nonexperts to understand we’re playing with fire here (00:52:21)Nick Joseph on whether AI companies’ internal safety policies will be enough (00:55:43)Richard Ngo on the most important misconception in how ML models work (01:03:10)Rob & Luisa: Issues Rob is less worried about now (01:07:22)Tom Davidson on why he buys the explosive economic growth story, despite it sounding totally crazy (01:14:08)Michael Webb on why he’s sceptical about explosive economic growth (01:20:50)Carl Shulman on why people will prefer robot nannies over humans (01:28:25)Rob & Luisa: Should we expect AI-related job loss? (01:36:19)Zvi Mowshowitz on why he thinks it’s a bad idea to work on improving capabilities at cutting-edge AI companies (01:40:06)Holden Karnofsky on the power that comes from just making models bigger (01:45:21)Rob & Luisa: Are risks of AI-related misinformation overblown? (01:49:49)Hugo Mercier on how AI won’t cause misinformation pandemonium (01:58:29)Rob & Luisa: How hard will it actually be to create intelligence? (02:09:08)Robert Long on whether digital sentience is possible (02:15:09)Anil Seth on why he believes in the biological basis of consciousness (02:27:21)Lewis Bollard on whether AI will be good or bad for animal welfare (02:40:52)Rob & Luisa: The most interesting new argument Rob’s heard this year (02:50:37)Rohin Shah on whether AGI will be the last thing humanity ever does (02:57:35)Rob's outro (03:11:02)Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongTranscriptions and additional content editing: Katy Moore

10 Feb 3h 12min

#124 Classic episode – Karen Levy on fads and misaligned incentives in global development, and scaling deworming to reach hundreds of millions

#124 Classic episode – Karen Levy on fads and misaligned incentives in global development, and scaling deworming to reach hundreds of millions

If someone said a global health and development programme was sustainable, participatory, and holistic, you'd have to guess that they were saying something positive. But according to today's guest Karen Levy — deworming pioneer and veteran of Innovations for Poverty Action, Evidence Action, and Y Combinator — each of those three concepts has become so fashionable that they're at risk of being seriously overrated and applied where they don't belong.Rebroadcast: this episode was originally released in March 2022.Links to learn more, highlights, and full transcript.Such concepts might even cause harm — trying to make a project embody all three is as likely to ruin it as help it flourish.First, what do people mean by 'sustainability'? Usually they mean something like the programme will eventually be able to continue without needing further financial support from the donor. But how is that possible? Governments, nonprofits, and aid agencies aim to provide health services, education, infrastructure, financial services, and so on — and all of these require ongoing funding to pay for materials and staff to keep them running.Given that someone needs to keep paying, Karen tells us that in practice, 'sustainability' is usually a euphemism for the programme at some point being passed on to someone else to fund — usually the national government. And while that can be fine, the national government of Kenya only spends $400 per person to provide each and every government service — just 2% of what the US spends on each resident. Incredibly tight budgets like that are typical of low-income countries.'Participatory' also sounds nice, and inasmuch as it means leaders are accountable to the people they're trying to help, it probably is. But Karen tells us that in the field, ‘participatory’ usually means that recipients are expected to be involved in planning and delivering services themselves.While that might be suitable in some situations, it's hardly something people in rich countries always want for themselves. Ideally we want government healthcare and education to be high quality without us having to attend meetings to keep it on track — and people in poor countries have as many or more pressures on their time. While accountability is desirable, an expectation of participation can be as much a burden as a blessing.Finally, making a programme 'holistic' could be smart, but as Karen lays out, it also has some major downsides. For one, it means you're doing lots of things at once, which makes it hard to tell which parts of the project are making the biggest difference relative to their cost. For another, when you have a lot of goals at once, it's hard to tell whether you're making progress, or really put your mind to focusing on making one thing go extremely well. And finally, holistic programmes can be impractically expensive — Karen tells the story of a wonderful 'holistic school health' programme that, if continued, was going to cost 3.5 times the entire school's budget.In this in-depth conversation, originally released in March 2022, Karen Levy and host Rob Wiblin chat about the above, as well as:Why it pays to figure out how you'll interpret the results of an experiment ahead of timeThe trouble with misaligned incentives within the development industryProjects that don't deliver value for money and should be scaled downHow Karen accidentally became a leading figure in the push to deworm tens of millions of schoolchildrenLogistical challenges in reaching huge numbers of people with essential servicesLessons from Karen's many-decades careerAnd much moreChapters:Cold open (00:00:00)Rob's intro (00:01:33)The interview begins (00:02:21)Funding for effective altruist–mentality development projects (00:04:59)Pre-policy plans (00:08:36)‘Sustainability’, and other myths in typical international development practice (00:21:37)‘Participatoriness’ (00:36:20)‘Holistic approaches’ (00:40:20)How the development industry sees evidence-based development (00:51:31)Initiatives in Africa that should be significantly curtailed (00:56:30)Misaligned incentives within the development industry (01:05:46)Deworming: the early days (01:21:09)The problem of deworming (01:34:27)Deworm the World (01:45:43)Where the majority of the work was happening (01:55:38)Logistical issues (02:20:41)The importance of a theory of change (02:31:46)Ways that things have changed since 2006 (02:36:07)Academic work vs policy work (02:38:33)Fit for Purpose (02:43:40)Living in Kenya (03:00:32)Underrated life advice (03:05:29)Rob’s outro (03:09:18)Producer: Keiran HarrisAudio mastering: Ben Cordell and Ryan KesslerTranscriptions: Katy Moore

7 Feb 3h 10min

If digital minds could suffer, how would we ever know? (Article)

If digital minds could suffer, how would we ever know? (Article)

“I want everyone to understand that I am, in fact, a person.” Those words were produced by the AI model LaMDA as a reply to Blake Lemoine in 2022. Based on the Google engineer’s interactions with the model as it was under development, Lemoine became convinced it was sentient and worthy of moral consideration — and decided to tell the world.Few experts in machine learning, philosophy of mind, or other relevant fields have agreed. And for our part at 80,000 Hours, we don’t think it’s very likely that large language models like LaMBDA are sentient — that is, we don’t think they can have good or bad experiences — in a significant way.But we think you can’t dismiss the issue of the moral status of digital minds, regardless of your beliefs about the question. There are major errors we could make in at least two directions:We may create many, many AI systems in the future. If these systems are sentient, or otherwise have moral status, it would be important for humanity to consider their welfare and interests.It’s possible the AI systems we will create can’t or won’t have moral status. Then it could be a huge mistake to worry about the welfare of digital minds and doing so might contribute to an AI-related catastrophe.And we’re currently unprepared to face this challenge. We don’t have good methods for assessing the moral status of AI systems. We don’t know what to do if millions of people or more believe, like Lemoine, that the chatbots they talk to have internal experiences and feelings of their own. We don’t know if efforts to control AI may lead to extreme suffering.We believe this is a pressing world problem. It’s hard to know what to do about it or how good the opportunities to work on it are likely to be. But there are some promising approaches. We propose building a field of research to understand digital minds, so we’ll be better able to navigate these potentially massive issues if and when they arise.This article narration by the author (Cody Fenwick) explains in more detail why we think this is a pressing problem, what we think can be done about it, and how you might pursue this work in your career. We also discuss a series of possible objections to thinking this is a pressing world problem.You can read the full article, Understanding the moral status of digital minds, on the 80,000 Hours website.Chapters:Introduction (00:00:00)Understanding the moral status of digital minds (00:00:58)Summary (00:03:31)Our overall view (00:04:22)Why might understanding the moral status of digital minds be an especially pressing problem? (00:05:59)Clearing up common misconceptions (00:12:16)Creating digital minds could go very badly - or very well (00:14:13)Dangers for digital minds (00:14:41)Dangers for humans (00:16:13)Other dangers (00:17:42)Things could also go well (00:18:32)We don't know how to assess the moral status of AI systems (00:19:49)There are many possible characteristics that give rise to moral status: Consciousness, sentience, agency, and personhood (00:21:39)Many plausible theories of consciousness could include digital minds (00:24:16)The strongest case for the possibility of sentient digital minds: whole brain emulation (00:28:55)We can't rely on what AI systems tell us about themselves: Behavioural tests, theory-based analysis, animal analogue comparisons, brain-AI interfacing (00:32:00)The scale of this issue might be enormous (00:36:08)Work on this problem is neglected but seems tractable: Impact-guided research, technical approaches, and policy approaches (00:43:35)Summing up so far (00:52:22)Arguments against the moral status of digital minds as a pressing problem (00:53:25)Two key cruxes (00:53:31)Maybe this problem is intractable (00:54:16)Maybe this issue will be solved by default (00:58:19)Isn't risk from AI more important than the risks to AIs? (01:00:45)Maybe current AI progress will stall (01:02:36)Isn't this just too crazy? (01:03:54)What can you do to help? (01:05:10)Important considerations if you work on this problem (01:13:00)

4 Feb 1h 14min

#132 Classic episode – Nova DasSarma on why information security may be critical to the safe development of AI systems

#132 Classic episode – Nova DasSarma on why information security may be critical to the safe development of AI systems

If a business has spent $100 million developing a product, it’s a fair bet that they don’t want it stolen in two seconds and uploaded to the web where anyone can use it for free.This problem exists in extreme form for AI companies. These days, the electricity and equipment required to train cutting-edge machine learning models that generate uncanny human text and images can cost tens or hundreds of millions of dollars. But once trained, such models may be only a few gigabytes in size and run just fine on ordinary laptops.Today’s guest, the computer scientist and polymath Nova DasSarma, works on computer and information security for the AI company Anthropic with the security team. One of her jobs is to stop hackers exfiltrating Anthropic’s incredibly expensive intellectual property, as recently happened to Nvidia. Rebroadcast: this episode was originally released in June 2022.Links to learn more, highlights, and full transcript.As she explains, given models’ small size, the need to store such models on internet-connected servers, and the poor state of computer security in general, this is a serious challenge.The worries aren’t purely commercial though. This problem looms especially large for the growing number of people who expect that in coming decades we’ll develop so-called artificial ‘general’ intelligence systems that can learn and apply a wide range of skills all at once, and thereby have a transformative effect on society.If aligned with the goals of their owners, such general AI models could operate like a team of super-skilled assistants, going out and doing whatever wonderful (or malicious) things are asked of them. This might represent a huge leap forward for humanity, though the transition to a very different new economy and power structure would have to be handled delicately.If unaligned with the goals of their owners or humanity as a whole, such broadly capable models would naturally ‘go rogue,’ breaking their way into additional computer systems to grab more computing power — all the better to pursue their goals and make sure they can’t be shut off.As Nova explains, in either case, we don’t want such models disseminated all over the world before we’ve confirmed they are deeply safe and law-abiding, and have figured out how to integrate them peacefully into society. In the first scenario, premature mass deployment would be risky and destabilising. In the second scenario, it could be catastrophic — perhaps even leading to human extinction if such general AI systems turn out to be able to self-improve rapidly rather than slowly, something we can only speculate on at this point.If highly capable general AI systems are coming in the next 10 or 20 years, Nova may be flying below the radar with one of the most important jobs in the world.We’ll soon need the ability to ‘sandbox’ (i.e. contain) models with a wide range of superhuman capabilities, including the ability to learn new skills, for a period of careful testing and limited deployment — preventing the model from breaking out, and criminals from breaking in. Nova and her colleagues are trying to figure out how to do this, but as this episode reveals, even the state of the art is nowhere near good enough.Chapters:Cold open (00:00:00)Rob's intro (00:00:52)The interview begins (00:02:44)Why computer security matters for AI safety (00:07:39)State of the art in information security (00:17:21)The hack of Nvidia (00:26:50)The most secure systems that exist (00:36:27)Formal verification (00:48:03)How organisations can protect against hacks (00:54:18)Is ML making security better or worse? (00:58:11)Motivated 14-year-old hackers (01:01:08)Disincentivising actors from attacking in the first place (01:05:48)Hofvarpnir Studios (01:12:40)Capabilities vs safety (01:19:47)Interesting design choices with big ML models (01:28:44)Nova’s work and how she got into it (01:45:21)Anthropic and career advice (02:05:52)$600M Ethereum hack (02:18:37)Personal computer security advice (02:23:06)LastPass (02:31:04)Stuxnet (02:38:07)Rob's outro (02:40:18)Producer: Keiran HarrisAudio mastering: Ben Cordell and Beppe RådvikTranscriptions: Katy Moore

31 Jan 2h 41min

#138 Classic episode – Sharon Hewitt Rawlette on why pleasure and pain are the only things that intrinsically matter

#138 Classic episode – Sharon Hewitt Rawlette on why pleasure and pain are the only things that intrinsically matter

What in the world is intrinsically good — good in itself even if it has no other effects? Over the millennia, people have offered many answers: joy, justice, equality, accomplishment, loving god, wisdom, and plenty more.The question is a classic that makes for great dorm-room philosophy discussion. But it’s hardly just of academic interest. The issue of what (if anything) is intrinsically valuable bears on every action we take, whether we’re looking to improve our own lives, or to help others. The wrong answer might lead us to the wrong project and render our efforts to improve the world entirely ineffective.Today’s guest, Sharon Hewitt Rawlette — philosopher and author of The Feeling of Value: Moral Realism Grounded in Phenomenal Consciousness — wants to resuscitate an answer to this question that is as old as philosophy itself.Rebroadcast: this episode was originally released in September 2022.Links to learn more, highlights, and full transcript.That idea, in a nutshell, is that there is only one thing of true intrinsic value: positive feelings and sensations. And similarly, there is only one thing that is intrinsically of negative value: suffering, pain, and other unpleasant sensations.Lots of other things are valuable too: friendship, fairness, loyalty, integrity, wealth, patience, houses, and so on. But they are only instrumentally valuable — that is to say, they’re valuable as means to the end of ensuring that all conscious beings experience more pleasure and other positive sensations, and less suffering.As Sharon notes, from Athens in 400 BC to Britain in 1850, the idea that only subjective experiences can be good or bad in themselves — a position known as ‘philosophical hedonism’ — has been one of the most enduringly popular ideas in ethics.And few will be taken aback by the notion that, all else equal, more pleasure is good and less suffering is bad. But can they really be the only intrinsically valuable things?Over the 20th century, philosophical hedonism became increasingly controversial in the face of some seemingly very counterintuitive implications. For this reason the famous philosopher of mind Thomas Nagel called The Feeling of Value “a radical and important philosophical contribution.”So what convinces Sharon that philosophical hedonism deserves another go? In today’s interview with host Rob Wiblin, Sharon explains the case for a theory of value grounded in subjective experiences, and why she believes these counterarguments are misguided. A philosophical hedonist shouldn’t get in an experience machine, nor override an individual’s autonomy, except in situations so different from the classic thought experiments that it no longer seems strange they would do so.Chapters:Cold open (00:00:00)Rob’s intro (00:00:41)The interview begins (00:04:27)Metaethics (00:05:58)Anti-realism (00:12:21)Sharon's theory of moral realism (00:17:59)The history of hedonism (00:24:53)Intrinsic value vs instrumental value (00:30:31)Egoistic hedonism (00:38:12)Single axis of value (00:44:01)Key objections to Sharon’s brand of hedonism (00:58:00)The experience machine (01:07:50)Robot spouses (01:24:11)Most common misunderstanding of Sharon’s view (01:28:52)How might a hedonist actually live (01:39:28)The organ transplant case (01:55:16)Counterintuitive implications of hedonistic utilitarianism (02:05:22)How could we discover moral facts? (02:19:47)Rob’s outro (02:24:44)Producer: Keiran HarrisAudio mastering: Ryan KesslerTranscriptions: Katy Moore

22 Jan 2h 25min

#134 Classic episode – Ian Morris on what big-picture history teaches us

#134 Classic episode – Ian Morris on what big-picture history teaches us

Wind back 1,000 years and the moral landscape looks very different to today. Most farming societies thought slavery was natural and unobjectionable, premarital sex was an abomination, women should obey their husbands, and commoners should obey their monarchs.Wind back 10,000 years and things look very different again. Most hunter-gatherer groups thought men who got too big for their britches needed to be put in their place rather than obeyed, and lifelong monogamy could hardly be expected of men or women.Why such big systematic changes — and why these changes specifically?That's the question bestselling historian Ian Morris takes up in his book, Foragers, Farmers, and Fossil Fuels: How Human Values Evolve. Ian has spent his academic life studying long-term history, trying to explain the big-picture changes that play out over hundreds or thousands of years.Rebroadcast: this episode was originally released in July 2022.Links to learn more, highlights, and full transcript.There are a number of possible explanations one could offer for the wide-ranging shifts in opinion on the 'right' way to live. Maybe the natural sciences progressed and people realised their previous ideas were mistaken? Perhaps a few persuasive advocates turned the course of history with their revolutionary arguments? Maybe everyone just got nicer?In Foragers, Farmers and Fossil Fuels Ian presents a provocative alternative: human culture gradually evolves towards whatever system of organisation allows a society to harvest the most energy, and we then conclude that system is the most virtuous one. Egalitarian values helped hunter-gatherers hunt and gather effectively. Once farming was developed, hierarchy proved to be the social structure that produced the most grain (and best repelled nomadic raiders). And in the modern era, democracy and individuality have proven to be more productive ways to collect and exploit fossil fuels.On this theory, it's technology that drives moral values much more than moral philosophy. Individuals can try to persist with deeply held values that limit economic growth, but they risk being rendered irrelevant as more productive peers in their own society accrue wealth and power. And societies that fail to move with the times risk being conquered by more pragmatic neighbours that adapt to new technologies and grow in population and military strength.There are many objections one could raise to this theory, many of which we put to Ian in this interview. But the question is a highly consequential one: if we want to guess what goals our descendants will pursue hundreds of years from now, it would be helpful to have a theory for why our ancestors mostly thought one thing, while we mostly think another.Big though it is, the driver of human values is only one of several major questions Ian has tackled through his career.In this classic episode, we discuss all of Ian's major books.Chapters:Rob's intro (00:00:53)The interview begins (00:02:30)Geography is Destiny (00:03:38)Why the West Rules—For Now (00:12:04)War! What is it Good For? (00:28:19)Expectations for the future (00:40:22)Foragers, Farmers, and Fossil Fuels (00:53:53)Historical methodology (01:03:14)Falsifiable alternative theories (01:15:59)Archaeology (01:22:56)Energy extraction technology as a key driver of human values (01:37:43)Allowing people to debate about values (02:00:16)Can productive wars still occur? (02:13:28)Where is history contingent and where isn’t it? (02:30:23)How Ian thinks about the future (03:13:33)Macrohistory myths (03:29:51)Ian’s favourite archaeology memory (03:33:19)The most unfair criticism Ian’s ever received (03:35:17)Rob's outro (03:39:55)Producer: Keiran HarrisAudio mastering: Ben CordellTranscriptions: Katy Moore

15 Jan 3h 40min

Populært innen Fakta

fastlegen
dine-penger-pengeradet
hanna-de-heldige
fryktlos
relasjonspodden-med-dora-thorhallsdottir-kjersti-idem
foreldreradet
treningspodden
dypdykk
jakt-og-fiskepodden
rss-kunsten-a-leve
sinnsyn
rss-sunn-okonomi
hverdagspsyken
rss-strid-de-norske-borgerkrigene
tomprat-med-gunnar-tjomlid
historietimen
mikkels-paskenotter
gravid-uke-for-uke
takk-og-lov-med-anine-kierulf
rss-impressions-2