Lecture 9 - Policy Gradient Methods

04.15

Instructor: Yaodong Yang

Topics Covered

Review
- 1.1 Tabular Reinforcement Learning
  - 1.1.1 Model-Based Dynamic Programming
  - 1.1.2 Model-Free Reinforcement Learning
- 1.2 Parametric Value Function Approximation
- 1.3 Stochastic Gradient Descent (SGD)
- 1.4 State-Action Value Function Approximation
  - 1.4.1 Linear State-Action Value Function Approximation
  - 1.4.2 Temporal Difference State-Action Value Function Approximation
  - 1.4.3 Value and Policy Approximation
- 1.5 End-to-End Reinforcement Learning
- 1.6 Value Function Approximation
- 1.7 Q-Function Approximation
- 1.8 Deep Q-Network (DQN)
Foundations of the Policy Gradient Method
- 2.1 Policy-Based Reinforcement Learning
- 2.2 Policy Gradient
- 2.3 Policy Gradient Theorem
- 2.4 Monte Carlo Policy Gradient
- 2.5 MLE vs. Policy Gradient
- 2.6 Policy Gradient Theorem (Alternative Perspective)
- 2.7 Softmax Stochastic Policy
- 2.8 Variance Reduction Techniques
- 2.9 Actor-Critic Methods
- 2.10 Generalized Advantage Estimation (GAE)
- 2.11 Approximate Methods for Value and Policy Approximation
- 2.12 Summary of Policy Gradient Algorithms
Deep Policy Gradient Methods
- 3.1 From AIGC to AIGA (AI-Generated Content to AI-Guided Actions)
- 3.2 Diffusion Policy