Lecture 11 - Multi-Agent Reinforcement Learning

04.29

Instructor: Yaodong Yang

Topics Covered

  1. Introduction
    • 1.1 MARL Applications
    • 1.2 Multi-Agent Systems
    • 1.3 Challenges in MARL
    • 1.4 Multi-Agent Credit Assignment
  2. Game Models
    • 2.1 Normal-Form Games
    • 2.2 Classes of Games
    • 2.3 Repeated Normal-Form Games
    • 2.4 Stochastic Games
    • 2.5 Example: Level-Based Foraging
    • 2.6 Partially Observable Stochastic Games (POSG)
      • 2.6.1 Formal Definition
      • 2.6.2 Interaction Process
  3. Modeling Communication
    • 3.1 Communication Modeling
    • 3.2 Communication as Actions
    • 3.3 Communication in Stochastic Games
    • 3.4 Communication in POSGs
  4. Assumptions in Games
    • 4.1 Assumptions in Game Theory
    • 4.2 Assumptions in MARL
    • 4.3 Reinforcement Learning vs. Game Theory
    • 4.4 Solution Concepts in Games
  5. Joint Policy and Expected Return
    • 5.1 Joint Policy Representation
    • 5.2 History-Based Expected Return
    • 5.3 Recursive Expected Return
  6. MARL Learning Framework
    • 6.1 Learning Process in MARL
    • 6.2 Policy Inputs
    • 6.3 Convergence Analysis
    • 6.4 Reducing to Single-Agent Reinforcement Learning
      • 6.4.1 Centralized Learning & Centralized Q-Learning
      • 6.4.2 Independent Learning
  7. Independent Learning
    • 7.1 Independent Q-Learning (IQL)
    • 7.2 IQL and Centralized Q-Learning in Level-Based Foraging
    • 7.3 Operational Models in MARL
  8. Challenges in MARL
    • 8.1 Overview of Key Challenges
    • 8.2 Non-Stationarity
    • 8.3 Equilibrium Selection
    • 8.4 Joint Action Space Complexity
    • 8.5 Scalability to Many Agents
    • 8.6 Dynamic Programming for Multi-Agent Games: Value Iteration
    • 8.7 Temporal-Difference Learning: Joint-Action Learning
    • 8.8 Examples: Minimax-Q, Nash-Q, CE-Q
    • 8.9 Agent Modeling, Policy Reconstruction, and Best Response
    • 8.10 Fictitious Play
    • 8.11 Joint-Action Learning with Agent Modeling
    • 8.12 Policy-Based Learning in MARL
    • 8.13 Gradient Ascent in Expected Rewards
    • 8.14 Learning Dynamics of Infinitesimal Gradient Ascent
    • 8.15 WOLF-IGA and WOLF-PHC Algorithms
    • 8.16 No-Regret Learning Approaches
    • 8.17 Unconditional vs. Conditional Regret Matching
    • 8.18 Summary of MARL Challenges
Previous
Next