Diffusion LLMs - The Fastest LLMs Ever Built | Stefano Ermon, cofounder of Inception Labs

09 Oct 2025 • 39 min • EN
39 min
00:00
39:09
No file found

Stefano Ermon is the cofounder of Inception Labs and an associate professor at Stanford. Inception is developing a new type of AI models called Diffusion LLMs. Stefano"s favorite book: If on a Winter"s Night a Traveler (Author: Italo Calvino) (00:01) Introduction (00:38) What are autoregressive LLMs and how do they work (02:28) How diffusion LLMs rethink generation (04:02) The ceiling of autoregressive LLMs: cost, latency, reliability (06:19) Why diffusion LLMs are commercially viable now (09:12) Parallel refinement: how diffusion models generate text (12:05) Understanding diffusion steps and efficiency (13:49) Hardest engineering challenges at Inception (15:23) From research to production: the power of data (16:24) Where diffusion LLMs still lag behind (18:18) Evaluations and benchmarks for diffusion LLMs (20:20) Developer experience and OpenAI-compatible API (21:47) Economics and GPU efficiency (23:38) Hardware and runtime stack (24:58) Competition and the evolving diffusion LLM landscape (27:01) Where diffusion will win first — coding and agentic systems (30:13) How diffusion changes infra, serving, and hardware design (33:04) What’s next at Inception: reasoning and multimodality (35:20) Rapid Fire Round -------- Where to find Stefano Ermon:  LinkedIn: https://www.linkedin.com/in/ermon/ -------- Where to find Prateek Joshi:  Research column: https://www.infrastartups.com Newsletter: https://prateekjoshi.substack.com  Website: https://prateekj.com  LinkedIn: https://www.linkedin.com/in/prateek-joshi-infinite X: https://x.com/prateekvjoshi 

From "Infinite Curiosity Pod with Prateek Joshi"

Listen on your iPhone

Download our iOS app and listen to interviews anywhere. Enjoy all of the listener functions in one slick package. Why not give it a try?

App Store Logo
application screenshot

Popular categories