2024: Multi-agent Learning

Instructor: Eugene Vinitsky

Course details

Meeting room: 2 Metrotech Center, Room 805, Wednesdays, 6:00-8:30pm
Office hours: 459 6 Metrotech Center, Wednesdays and Fridays 10-11 AM.

Course Breakdown

This course is a graduate seminar whose intent is to get you as quickly up to speed on the state-of-the-art so that you are able to read papers or participate in multi-agent learning research. As such, it is by necessity not an in-depth course on any of the particular topics and the emphasis is on reading papers and course projects over homework. This means that as you go through the material you will find yourself not fully understanding it. That is an intended outcome; this course is not about basics but about giving you a roadmap of multi-agent learning research as well as just enough comfort to confidently dive in deeper.

At the conclusion of the course, I expect to cover:

Why does multi-agent learning matter? Why is it relevant to you as an engineer?
Different notions of equilibria in multi-agent learning
What are the key challenges in multi-agent learning?
Basics of RL
Techniques for converging towards equilibria (CFR, MARL)
How some state-of-the-art agents were developed (starcraft, poker, stratego)
Open challenges and problems in multi-agent learning

As two and a half-hours is a very long time to listen to a lecture, the second half of many of the classes will be a reading group. For more details on how the reading groups will work, see Sec. Reading Group. Please read this entire document, I've put a lot of work into it and understanding the logic of it should help you engage with the course more deeply.

Course Project

This is a graduate course intended to help you to potentially incorporate ideas from multi-agent learning into your research. As such, the primary element of the course is a project that you will develop over the duration of the class. There are three elements of the project:

Project proposal (2/28)
Mid-semester checkpoint (3/27)
Final project write-up (5/1)

The final project is intended to be a paper written in the style of an RLC submission. See the associated style-file on that page. The paper should roughly be eight pages, though it can be longer, and should conform to the style of a paper e.g. it should either demonstrate a new result, describe the construction of an engineering project in detail, or be an in-depth investigation of a topic. As an alternative option to new work, I will also allow for the construction of a clear write-up of a paper that would allow a beginner to understand it. See the ICLR blog post track for examples of what I mean. Note that longer papers will not receive additional credit for being so: eight pages is the expectation and the longer limit is merely to give you extra space if you need it. Similarly, if a substantive result can be described in fewer than eight pages that is also fine.

Project Proposal

The project proposal should be a one-page, LaTeX document outlining a concrete research question or engineering task. I'm being insistent about LaTeX here because if you don't know LaTeX yet you probably need to learn it at this point. We are doing it quite early in the course, possibly before you have all of the relevant background, so that I can help you refine your project proposal.

Mid-Semester Checkpoint

At this point I expect you to have a 3 page writeup that outlines your progress so far, any open questions that you have not been able to resolve, and a list of additional work that remains to be done before the completion of the final project.

Reading Group Details

Reading groups can be fun and engaging or very dull. To try to ensure that everyone has a reason to be engaged, we're going to use the following format from Colin Raffel: Role-playing reading group seminars. This is put as reading for the second class because we will start using it before the third class. Note that the hacker role will not be available every week.

Additionally, each week of reading group I will expect to receive a short summary of the paper before the reading group. I will provide a format for the summary. The reasoning for this is that I want you to get used to taking a large paper and condensing it into a few key ideas that you could easily explain to someone else. In addition, having you write up a summary allows me to provide feedback on how well you understand the paper. I do not expect this to take more than a few minutes if you have read the paper. Note, you will not be graded on if your summary is correct, only that you have done it.

Because of highly variable backgrounds and the logic of the course organization, some of you will will not have the necessary background for some of the papers. This also means that some of the papers for the reading groups will be challenging very early on! My suggestion is to view this as an exciting challenge that is very analogous to your first year attending a seminar in a new topic. When you start such a seminar, you can only catch fragments of what is being discussed; the pace is simply too fast and too much of the material as new. Keep in mind that learning is a process, you don't transition from not knowing to knowing in a single leap.

As such, your goal in each reading group is to take away 3 new things. These can be a new ideas, a new technique, a question. This way, even if you don't understand everything, you've come away more knowledgeable than you came in.

Grading

This is a graduate course and grading is intended to be fairly lenient; the expectation is that you are excited to learn things and do not need to be cajoled to do so by fear of a bad grade. As such, most of the grade is through participation. If you do the work, you should expect to receive a very good grade. There will be 2 straightforward homework assignments through the course and a small number of quizzes that are intended to mimic spaced repetition and help you assess whether you have appropriately understood the materials. The grading will be as follows:

5% paper summaries before reading group
5% the quizzes
5% the homeworks
20% project proposal
25% mid-project checkpoint
40% final project

Cheating Policy

Cheating is obviously not allowed. Copying answers or code from another student or the internet constitutes cheating. However, collaborating with another student is allowed as long as you indicate the student that you collaborated with and your answers and writing are in your own words.

ChatGPT Policy

I love ChatGPT and use it all the time. However, one goal of learning is to develop fluency, the process in which you can come up with ideas and use tools and knowledge without reference to an external data store. This is very similar to development of language fluency; imagine if instead of learning a foreign language you tried to look every word up in a dictionary! The same thing happens in research; there are ideas you want to be able to pull out without having to look them up every time. I will try to make clear in the course what those foundational concepts are. A similar thing applies to writing; you want to be able to write quickly and thoughtfully and the only way to get there is practice and repetition.

Using ChatGPT to skip steps delays the development of fluency. In this way, overuse of ChatGPT will cause you to be a worse researcher in the long term and harms your educational experience. As such, my rules for ChatGPT are the following:

You may use ChatGPT as a learning tool, to ask it questions about material in the hope of receiving useful explanations that are tailored to you. Note that just like anything else on the internet, these explanations may be wrong!
I ask that you not use ChatGPT as a writing tool. This includes using it to sketch out the form of your project or to draft a preliminary version of it. Even if you would do this in practice, the goal here is to develop the skill of writing research!
You should not be using ChatGPT to write the answers to homeworks. Asking ChatGPT to write any part of the solutions to these problems will be considered cheating.

Note, I acknowledge that there's basically no way for me to check whether you have followed these rules. My hope is that you use it in the ways outlined above because the alternative will harm your development as a researcher, not out of fear of consequence.

Office Hours

I will host a 1 hour office hour twice a week. The time will be updated here once it is scheduled. Please use this time to come talk to me about homeworks, research, whatever. It's your time to use and I am excited to talk to you!

Inclusion Statement

The NYU Tandon School of Engineering values an inclusive and equitable environment for all our students. I hope to foster a sense of community in this class and consider it a place where individuals of all backgrounds, beliefs, ethnicities, national origins, gender identities, sexual orientations, religious and political affiliations, and abilities will be treated with respect. It is my intent that all students’ learning needs be addressed, and that the diversity that students bring to this class be viewed as a resource, strength and benefit. If this standard is not being upheld, please feel free to speak with me.

Moses Center Statement of Disability

If you are student with a disability who is requesting accommodations, please contact New York University’s Moses Center for Students with Disabilities at 212-998-4980 or mosescsd@nyu.edu. You must be registered with CSD to receive accommodations. Information about the Moses Center can be found at www.nyu.edu/csd. The Moses Center is located at 726 Broadway on the 2nd floor.

Course Schedule - Topics

A quick overview of the logic of the course schedule. Based on preliminary discussions, not everyone entering the course has the background in RL that is necessary to get to the multi-agent learning pieces that form their project. However, we need to get some of the multi-agent learning pieces in so that you actually have some tools and ideas with which to propose your project! As such, we're going to start with a quick overview of some multi-agent topics. We're then going to detour for a few weeks to an overview of RL before returning back to the multi-agent component.

Weekly requirements TBD. This lecture is slack based on if more time is needed for prior topics
Guest lecture TBD

Date	Topics Covered	Expected Learning Outcome	Weekly Requirements	Course materials
1/24	Why is multi-agent learning interesting or useful? An overview of learning and multi-agent systems	Identify differences between learning in multi-agent and single-agent settings Challenges of multi-agent learning Applicability in your own work	Read If multi-agent learning is the answer, what is the question? Read Multi-agent learning for engineers	Slides PDF of slides
1/31	Normal form games Different notions of equilibria Solving normal form games	What is a normal form game? Defining convergence in multi-agent learning Solving for Nash in simple settings	Read Chapter 3 of Multi-agent systems book Read Role-playing reading group seminars	Lecture Notes
2/7	Introduction to RL notation Markov decision processes Zero-th order algorithms	Understand framing problems as an MDP Basic tools for learning a policy in an MDP	Read Sutton and Barto, Chp. 2, 3 Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research	Lecture Notes
2/14	Extensions of Markov Decision Processes Value iteration Q-Learning	Understanding POMDPs Coding basic RL algorithms	Homework 1 (due 3/06) Read Sutton and Barto, Chp. 4 Reading group: Open Problems in Cooperative AI	Lecture Notes
2/21	Actor critic algorithms Value iteration based algorithms	Code up a basic Q-learning agent and a REINFORCE agent	Read Sutton and Barto, Chp. 13 Reading group: None
2/28	Extensive form games Zero-sum games Tree search procedures (minimax search, monte-carlo tree search)	Understanding of extensive form games	Read Chapter 5 of Multi-agent systems book Reading group: Dyna, an Integrated Architecture for Learning, Planning, and Reacting
3/6	Imperfect information games Regret minimization Follow-the-regularized leader algorithms	Understanding of the listed topics	Project proposal due Reading group: Mastering the Game of Go without Human Knowledge
3/13	Counterfactual regret minimization (CFR) MCCFR Fictitious play	Basics of state-of-the-art in imperfect information games	Homework 2 due 3/20 Reading: lecture notes + Regret Minimization in Games with Incomplete Information Reading group: Real-world games look like spinning tops
3/20	Break!
3/27	Centralized Training, Decentralized Execution Value decomposition networks	Usefulness of centralized training Centralized training, decentralized execution	Reading group: Monotonic value function factorisation for deep multi-agent reinforcement learning
4/3	Opponent shaping Population methods	Challenges in opponent shaping methods Approaches in population methods	Read lecture notes Reading group: Learning with Opponent Learning Awareness Project mid-point due
4/10	Mean field games	Challenges of learning with many agents and solutions through mean-field games	Mini-presentations: Form a group of 2, find a paper relevant to the group and make a 5-minute presentation on it
4/17	Ad-hoc team play	Characterizations for performant ad-hoc team-play
4/24	Regularized learning methods	Tools and techniques to improve convergence in MARL methods	Guest lecture by Samuel Sokota Reading group: Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning
5/1	Course project presentations!