Lecture 12 - Cooperative Games

05.13

Instructor: Yaodong Yang

Topics Covered

  1. Concurrent Learning
    • 1.1 Synchronous vs. Asynchronous Training
    • 1.2 Practical Considerations: Observations, States, and Histories
  2. Training and Execution Models
    • 2.1 Training vs. Execution Paradigms
    • 2.2 Challenges in Multi-Agent Reinforcement Learning Deployment
  3. Multi-Agent Policy Gradient Algorithms
    • 3.1 Foundations of Policy Gradient Theory
    • 3.2 Multi-Agent Policy Gradient Theorem
    • 3.3 Centralized Critic Architecture
    • 3.4 Centralized Advantage Actor-Critic
    • 3.5 Counterfactual Multi-Agent Policy Gradient (COMA)
    • 3.6 Equilibrium Selection in Policy Gradient
    • 3.7 Pareto Actor-Critic Algorithm
  4. Value Decomposition in Common-Reward Games
    • 4.1 Individual-Global-Max (IGM) Property
    • 4.2 Linear Value Decomposition Techniques
    • 4.3 QMIX Algorithm
    • 4.4 Value Decomposition in Matrix and Climbing Games
  5. Agent Modeling with Deep Learning
    • 5.1 Deep Agent Modeling Techniques
    • 5.2 Joint-Action Learning with Deep Models
    • 5.3 Learning Compact Representations of Agent Policies
  6. Parameter and Experience Sharing
    • 6.1 Motivation for Sharing
    • 6.2 Environments with Homogeneous Agents
    • 6.3 Parameter Sharing Strategies
  7. Heterogeneous-Agent Reinforcement Learning
    • 7.1 Cooperative Markov Game
    • 7.2 Environment State Distribution
    • 7.3 Value Functions
    • 7.4 Multi-Agent Advantage Decomposition Lemma
    • 7.5 Trust Region Policy Optimization (TRPO)
    • 7.6 Heterogeneous-Agent Proximal Policy Optimization
    • 7.7 Paper: Mirror Learning Unifies Policy Optimization
    • 7.8 Establishing Heterogeneous-Agent Mirror Learning
    • 7.9 HAML Instances & HARL Implementation
      • 7.9.1 MPE
      • 7.9.2 MAMuJoCo
      • 7.9.3 SMAC & SMACv2
      • 7.9.4 Google Research Football
      • 7.9.5 Bi-DexterousHands
Previous