Lecture 8 - Deep Value Learning Methods

04.08

Instructor: Yaodong Yang

Topics Covered

  1. Review
    • 1.1 Introduction to Reinforcement Learning
    • 1.2 Reinforcement Learning Based on Dynamic Programming
    • 1.3 Temporal Difference Learning
  2. About TD(λ)
    • 2.1 SARSA
      • 2.2.1 SARSA Algorithm
      • 2.2.2 Online Policy Control in SARSA
    • 2.2 Off-policy Algorithms
    • 2.3 Q-Learning
      • 2.3.1 Q-Learning Control Algorithm
      • 2.3.2 Offline Policy Control in Q-Learning
      • 2.3.3 Convergence of Q-Learning
      • 2.3.4 Application Examples
    • 2.4 Unifying SARSA and Q-Learning
    • 2.5 Comparison Between Dynamic Programming and Temporal Difference Methods
    • 2.6 N-step Cumulative Rewards
    • 2.7 λ Return
    • 2.8 Advantage Function
    • 2.9 Generalized Advantage Estimation (GAE)
    • 2.10 Backward View
    • 2.11 Eligibility Traces
    • 2.12 TD(λ) with Neural Networks
  3. Deep Reinforcement Learning
    • 3.1 Deep Reinforcement Learning = Deep Learning + Reinforcement Learning
    • 3.2 Classification
      • 3.2.1 Value-based Methods
      • 3.2.2 Stochastic Policy-based Methods
      • 3.2.3 Deterministic Policy-based Methods
    • 3.3 Deep Q-Network (DQN)
      • 3.3.1 Intuition and Solution Overview
      • 3.3.2 Algorithm Flow
      • 3.3.3 Overestimation in Q-Learning
      • 3.3.4 Double DQN
      • 3.3.5 Experience Replay
      • 3.3.6 Dueling DQN
      • 3.3.7 Summary
    • 3.4 Discretization of Continuous Markov Decision Processes
    • 3.5 Bucket-based Methods for Large-Scale Markov Decision Processes
  4. Next Class: Policy Gradient Methods and Deep Policy Gradient Methods
Previous
Next