policy-gradients | Taewoon Kim

August 7, 2025 Taewoon Kim

From REINFORCE to PPO: The Complete On-Policy RL Journey

Understanding the evolution from basic policy gradients to modern LLM fine-tuning algorithms

Motivation: Why On Policy RL Matters for Modern AI If you've been following the latest developments in large language models (LLMs), you've probably heard of GRPO (Group Relative P...

From REINFORCE to PPO: The Complete On-Policy RL Journey

Learn more