180: Reinforcement Learning

17 Mar 2025 • 112 min • EN
112 min
00:00
01:52:22
No file found

Intro topic: Grills News/Links: You can’t call yourself a senior until you’ve worked on a legacy projecthttps://www.infobip.com/developers/blog/seniors-working-on-a-legacy-projectRecraft might be the most powerful AI image platform I’ve ever used — here’s whyhttps://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-whyNASA has a list of 10 rules for software developmenthttps://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htmAMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GREhttps://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre  Book of the ShowPatrick: The Player of Games (Ian M Banks)https://a.co/d/1ZpUhGl (non-affiliate)Jason: Basic Roleplaying Universal Game Enginehttps://amzn.to/3ES4p5i Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h Tool of the ShowPatrick: Pokemon Sword and ShieldJason: Features and Labels ( https://fal.ai ) Topic: Reinforcement Learning Three types of AISupervised LearningUnsupervised LearningReinforcement LearningOnline vs Offline RLOptimization algorithmsValue optimizationSARSAQ-LearningPolicy optimizationPolicy GradientsActor-CriticProximal Policy OptimizationValue vs Policy OptimizationValue optimization is more intuitive (Value loss)Policy optimization is less intuitive at first (policy gradients)Converting values to policies in deep learning is difficultImitation LearningSupervised policy learningOften used to bootstrap reinforcement learningPolicy EvaluationPropensity scoring versus model-basedChallenges to training RL modelTwo optimization loopsCollecting feedback vs updating the modelDifficult optimization targetPolicy evaluationRLHF &  GRPO ★ Support this podcast on Patreon ★

From "Programming Throwdown"

Listen on your iPhone

Download our iOS app and listen to interviews anywhere. Enjoy all of the listener functions in one slick package. Why not give it a try?

App Store Logo
application screenshot

Popular categories