Lecture 8 - Deep Value Learning Methods

04.08

Instructor: Yaodong Yang

Topics Covered

Review
- 1.1 Introduction to Reinforcement Learning
- 1.2 Reinforcement Learning Based on Dynamic Programming
- 1.3 Temporal Difference Learning
About TD(λ)
- 2.1 SARSA
  - 2.2.1 SARSA Algorithm
  - 2.2.2 Online Policy Control in SARSA
- 2.2 Off-policy Algorithms
- 2.3 Q-Learning
  - 2.3.1 Q-Learning Control Algorithm
  - 2.3.2 Offline Policy Control in Q-Learning
  - 2.3.3 Convergence of Q-Learning
  - 2.3.4 Application Examples
- 2.4 Unifying SARSA and Q-Learning
- 2.5 Comparison Between Dynamic Programming and Temporal Difference Methods
- 2.6 N-step Cumulative Rewards
- 2.7 λ Return
- 2.8 Advantage Function
- 2.9 Generalized Advantage Estimation (GAE)
- 2.10 Backward View
- 2.11 Eligibility Traces
- 2.12 TD(λ) with Neural Networks
Deep Reinforcement Learning
- 3.1 Deep Reinforcement Learning = Deep Learning + Reinforcement Learning
- 3.2 Classification
  - 3.2.1 Value-based Methods
  - 3.2.2 Stochastic Policy-based Methods
  - 3.2.3 Deterministic Policy-based Methods
- 3.3 Deep Q-Network (DQN)
  - 3.3.1 Intuition and Solution Overview
  - 3.3.2 Algorithm Flow
  - 3.3.3 Overestimation in Q-Learning
  - 3.3.4 Double DQN
  - 3.3.5 Experience Replay
  - 3.3.6 Dueling DQN
  - 3.3.7 Summary
- 3.4 Discretization of Continuous Markov Decision Processes
- 3.5 Bucket-based Methods for Large-Scale Markov Decision Processes
Next Class: Policy Gradient Methods and Deep Policy Gradient Methods