04.29
Instructor: Yaodong Yang
Topics Covered
- Introduction
- 1.1 MARL Applications
- 1.2 Multi-Agent Systems
- 1.3 Challenges in MARL
- 1.4 Multi-Agent Credit Assignment
- Game Models
- 2.1 Normal-Form Games
- 2.2 Classes of Games
- 2.3 Repeated Normal-Form Games
- 2.4 Stochastic Games
- 2.5 Example: Level-Based Foraging
- 2.6 Partially Observable Stochastic Games (POSG)
- 2.6.1 Formal Definition
- 2.6.2 Interaction Process
- Modeling Communication
- 3.1 Communication Modeling
- 3.2 Communication as Actions
- 3.3 Communication in Stochastic Games
- 3.4 Communication in POSGs
- Assumptions in Games
- 4.1 Assumptions in Game Theory
- 4.2 Assumptions in MARL
- 4.3 Reinforcement Learning vs. Game Theory
- 4.4 Solution Concepts in Games
- Joint Policy and Expected Return
- 5.1 Joint Policy Representation
- 5.2 History-Based Expected Return
- 5.3 Recursive Expected Return
- MARL Learning Framework
- 6.1 Learning Process in MARL
- 6.2 Policy Inputs
- 6.3 Convergence Analysis
- 6.4 Reducing to Single-Agent Reinforcement Learning
- 6.4.1 Centralized Learning & Centralized Q-Learning
- 6.4.2 Independent Learning
- Independent Learning
- 7.1 Independent Q-Learning (IQL)
- 7.2 IQL and Centralized Q-Learning in Level-Based Foraging
- 7.3 Operational Models in MARL
- Challenges in MARL
- 8.1 Overview of Key Challenges
- 8.2 Non-Stationarity
- 8.3 Equilibrium Selection
- 8.4 Joint Action Space Complexity
- 8.5 Scalability to Many Agents
- 8.6 Dynamic Programming for Multi-Agent Games: Value Iteration
- 8.7 Temporal-Difference Learning: Joint-Action Learning
- 8.8 Examples: Minimax-Q, Nash-Q, CE-Q
- 8.9 Agent Modeling, Policy Reconstruction, and Best Response
- 8.10 Fictitious Play
- 8.11 Joint-Action Learning with Agent Modeling
- 8.12 Policy-Based Learning in MARL
- 8.13 Gradient Ascent in Expected Rewards
- 8.14 Learning Dynamics of Infinitesimal Gradient Ascent
- 8.15 WOLF-IGA and WOLF-PHC Algorithms
- 8.16 No-Regret Learning Approaches
- 8.17 Unconditional vs. Conditional Regret Matching
- 8.18 Summary of MARL Challenges