Lecture 9 - Policy Gradient Methods

04.15

Instructor: Yaodong Yang

Topics Covered

  1. Review
    • 1.1 Tabular Reinforcement Learning
      • 1.1.1 Model-Based Dynamic Programming
      • 1.1.2 Model-Free Reinforcement Learning
    • 1.2 Parametric Value Function Approximation
    • 1.3 Stochastic Gradient Descent (SGD)
    • 1.4 State-Action Value Function Approximation
      • 1.4.1 Linear State-Action Value Function Approximation
      • 1.4.2 Temporal Difference State-Action Value Function Approximation
      • 1.4.3 Value and Policy Approximation
    • 1.5 End-to-End Reinforcement Learning
    • 1.6 Value Function Approximation
    • 1.7 Q-Function Approximation
    • 1.8 Deep Q-Network (DQN)
  2. Foundations of the Policy Gradient Method
    • 2.1 Policy-Based Reinforcement Learning
    • 2.2 Policy Gradient
    • 2.3 Policy Gradient Theorem
    • 2.4 Monte Carlo Policy Gradient
    • 2.5 MLE vs. Policy Gradient
    • 2.6 Policy Gradient Theorem (Alternative Perspective)
    • 2.7 Softmax Stochastic Policy
    • 2.8 Variance Reduction Techniques
    • 2.9 Actor-Critic Methods
    • 2.10 Generalized Advantage Estimation (GAE)
    • 2.11 Approximate Methods for Value and Policy Approximation
    • 2.12 Summary of Policy Gradient Algorithms
  3. Deep Policy Gradient Methods
    • 3.1 From AIGC to AIGA (AI-Generated Content to AI-Guided Actions)
    • 3.2 Diffusion Policy
Previous
Next