The Gradient: Perspectives on AI

Davidad Dalrymple: Towards Provably Safe AI

05 Sep 2024 • 80 min • EN

Episode 137 I spoke with Davidad Dalrymple about: * His perspectives on AI risk * ARIA (the UK’s Advanced Research and Invention Agency) and its Safeguarded AI Programme Enjoy—and let me know what you think! Davidad is a Programme Director at ARIA. He was most recently a Research Fellow in technical AI safety at Oxford. He co-invented the top-40 cryptocurrency Filecoin, led an international neuroscience collaboration, and was a senior software engineer at Twitter and multiple startups. Find me on Twitter for updates on new episodes, and reach me at editor@thegradient.pub for feedback, ideas, guest suggestions. Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter Outline: * (00:00) Intro * (00:36) Calibration and optimism about breakthroughs * (03:35) Calibration and AGI timelines, effects of AGI on humanity * (07:10) Davidad’s thoughts on the Orthogonality Thesis * (10:30) Understanding how our current direction relates to AGI and breakthroughs * (13:33) What Davidad thinks is needed for AGI * (17:00) Extracting knowledge * (19:01) Cyber-physical systems and modeling frameworks * (20:00) Continuities between Davidad’s earlier work and ARIA * (22:56) Path dependence in technology, race dynamics * (26:40) More on Davidad’s perspective on what might go wrong with AGI * (28:57) Vulnerable world, interconnectedness of computers and control * (34:52) Formal verification and world modeling, Open Agency Architecture * (35:25) The Semantic Sufficiency Hypothesis * (39:31) Challenges for modeling * (43:44) The Deontic Sufficiency Hypothesis and mathematical formalization * (49:25) Oversimplification and quantitative knowledge * (53:42) Collective deliberation in expressing values for AI * (55:56) ARIA’s Safeguarded AI Programme * (59:40) Anthropic’s ASL levels * (1:03:12) Guaranteed Safe AI — * (1:03:38) AI risk and (in)accurate world models * (1:09:59) Levels of safety specifications for world models and verifiers — steps to achieve high safety * (1:12:00) Davidad’s portfolio research approach and funding at ARIA * (1:15:46) Earlier concerns about ARIA — Davidad’s perspective * (1:19:26) Where to find more information on ARIA and the Safeguarded AI Programme * (1:20:44) Outro Links: * Davidad’s Twitter * ARIA homepage * Safeguarded AI Programme * Papers * Guaranteed Safe AI * Davidad’s Open Agency Architecture for Safe Transformative AI * Dioptics: a Common Generalization of Open Games and Gradient-Based Learners (2019) * Asynchronous Logic Automata (2008) Get full access to The Gradient at thegradientpub.substack.com/subscribe

From "The Gradient: Perspectives on AI"