Taewoon Kim Taewoon Kim
Blog Contact CV
Archive

Blog

Posts tagged policy-gradients.

August 7, 2025 Taewoon Kim

From REINFORCE to PPO: The Complete On-Policy RL Journey

Understanding the evolution from basic policy gradients to modern LLM fine-tuning algorithms

Motivation: Why On Policy RL Matters for Modern AI If you've been following the latest developments in large language models (LLMs), you've probably heard of GRPO (Group Relative P...

From REINFORCE to PPO: The Complete On-Policy RL Journey
reinforcement-learning policy-gradients ppo grpo
Learn more
Taewoon Kim
Privacy
Email GitHub LinkedIn X YouTube Scholar

This site uses analytics only if you say yes. See the privacy policy for details.