Diffusion LLMs - The Fastest LLMs Ever Built | Stefano Ermon, cofounder of Inception Labs
Stefano Ermon is the cofounder of Inception Labs and an associate professor at Stanford. Inception is developing a new type of AI models called Diffusion LLMs. Stefano"s favorite book: If on a Winter"s Night a Traveler (Author: Italo Calvino) (00:01) Introduction (00:38) What are autoregressive LLMs and how do they work (02:28) How diffusion LLMs rethink generation (04:02) The ceiling of autoregressive LLMs: cost, latency, reliability (06:19) Why diffusion LLMs are commercially viable now (09:12) Parallel refinement: how diffusion models generate text (12:05) Understanding diffusion steps and efficiency (13:49) Hardest engineering challenges at Inception (15:23) From research to production: the power of data (16:24) Where diffusion LLMs still lag behind (18:18) Evaluations and benchmarks for diffusion LLMs (20:20) Developer experience and OpenAI-compatible API (21:47) Economics and GPU efficiency (23:38) Hardware and runtime stack (24:58) Competition and the evolving diffusion LLM landscape (27:01) Where diffusion will win first — coding and agentic systems (30:13) How diffusion changes infra, serving, and hardware design (33:04) What’s next at Inception: reasoning and multimodality (35:20) Rapid Fire Round -------- Where to find Stefano Ermon: LinkedIn: https://www.linkedin.com/in/ermon/ -------- Where to find Prateek Joshi: Research column: https://www.infrastartups.com Newsletter: https://prateekjoshi.substack.com Website: https://prateekj.com LinkedIn: https://www.linkedin.com/in/prateek-joshi-infinite X: https://x.com/prateekvjoshi
From "Infinite Curiosity Pod with Prateek Joshi"
Comments
Add comment Feedback