Lecture 7 - Introduction to Reinforcement Learning

04.01

Instructor: Yaodong Yang

Topics Covered

  1. Markov Decision Process
    • 1.1 Stochastic Process
    • 1.2 Markov Process
      • 1.2.1 Definition
      • 1.2.2 Property
    • 1.3 Markov Decision Process (MDP)
    • 1.4 Dynamics of MDP
      • 1.4.1 Supervised & Unsupervised Learning: Model
      • 1.4.2 Reinforcement Learning: Agent
    • 1.5 Occupancy Measure
      • 1.5.1 Policy
      • 1.5.2 Cumulative Reward
    • 1.6 Assumptions of Markov Decision Process
    • 1.7 Stationary Policy
  2. Reinforcement Learning Based on Dynamic Programming
    • 2.1 MDP Objectives and Policies
    • 2.2 Bellman’s Equation for Value Function
    • 2.3 Optimal Value Function
    • 2.4 Value Iteration & Policy Iteration
      • 2.4.1 Value Iteration
      • 2.4.2 Synchronous vs. Asynchronous Value Iteration
      • 2.4.3 Policy Iteration
      • 2.4.4 Example: Policy Evaluation
      • 2.4.5 Comparison: Value Iteration & Policy Iteration
    • 2.5 Policies in Policy Iteration
      • 2.5.1 Policy Optimization
      • 2.5.2 Proof
  3. Model-Based Reinforcement Learning
    • 3.1 Learning an MDP Model
      • 3.1.1 Model Learning
      • 3.1.2 Optimization Strategy
    • 3.2 Summary of Markov Decision Process
  4. Value Function Estimation
    • 4.1 Model-Free Reinforcement Learning
  5. Monte Carlo Methods
    • 5.1 Monte Carlo Value Estimation
    • 5.2 Incremental Monte Carlo Updates
    • 5.3 Monte Carlo Value Estimation Example
    • 5.4 Heuristic Sampling Example
    • 5.5 Importance Sampling
  6. Temporal Difference Learning
    • 6.1 Monte Carlo vs. Temporal Difference (MC vs. TD)
    • 6.2 Bias vs. Variance
    • 6.3 Example: Random Walk
    • 6.4 Backup
    • 6.5 Multi-Step Sequential Learning
    • 6.6 SARSA
      • 6.6.1 Online Policy Control
      • 6.6.2 ε-Greedy Policy Improvement
Previous