05.13
Instructor: Yaodong Yang
Topics Covered
- Concurrent Learning
- 1.1 Synchronous vs. Asynchronous Training
- 1.2 Practical Considerations: Observations, States, and Histories
- Training and Execution Models
- 2.1 Training vs. Execution Paradigms
- 2.2 Challenges in Multi-Agent Reinforcement Learning Deployment
- Multi-Agent Policy Gradient Algorithms
- 3.1 Foundations of Policy Gradient Theory
- 3.2 Multi-Agent Policy Gradient Theorem
- 3.3 Centralized Critic Architecture
- 3.4 Centralized Advantage Actor-Critic
- 3.5 Counterfactual Multi-Agent Policy Gradient (COMA)
- 3.6 Equilibrium Selection in Policy Gradient
- 3.7 Pareto Actor-Critic Algorithm
- Value Decomposition in Common-Reward Games
- 4.1 Individual-Global-Max (IGM) Property
- 4.2 Linear Value Decomposition Techniques
- 4.3 QMIX Algorithm
- 4.4 Value Decomposition in Matrix and Climbing Games
- Agent Modeling with Deep Learning
- 5.1 Deep Agent Modeling Techniques
- 5.2 Joint-Action Learning with Deep Models
- 5.3 Learning Compact Representations of Agent Policies
- Parameter and Experience Sharing
- 6.1 Motivation for Sharing
- 6.2 Environments with Homogeneous Agents
- 6.3 Parameter Sharing Strategies
- Heterogeneous-Agent Reinforcement Learning
- 7.1 Cooperative Markov Game
- 7.2 Environment State Distribution
- 7.3 Value Functions
- 7.4 Multi-Agent Advantage Decomposition Lemma
- 7.5 Trust Region Policy Optimization (TRPO)
- 7.6 Heterogeneous-Agent Proximal Policy Optimization
- 7.7 Paper: Mirror Learning Unifies Policy Optimization
- 7.8 Establishing Heterogeneous-Agent Mirror Learning
- 7.9 HAML Instances & HARL Implementation
- 7.9.1 MPE
- 7.9.2 MAMuJoCo
- 7.9.3 SMAC & SMACv2
- 7.9.4 Google Research Football
- 7.9.5 Bi-DexterousHands