Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation
dc.contributor.author
Sessa, Pier Giuseppe
dc.contributor.author
Kamgarpour, Maryam
dc.contributor.author
Krause, Andreas
dc.contributor.editor
Chaudhuri, Kamalika
dc.contributor.editor
Jegelka, Stefanie
dc.contributor.editor
Song, Le
dc.contributor.editor
Szepesvári, Csaba
dc.contributor.editor
Niu, Gang
dc.contributor.editor
Sabato, Sivan
dc.date.accessioned
2023-01-09T15:14:36Z
dc.date.available
2023-01-09T14:03:01Z
dc.date.available
2023-01-09T15:01:30Z
dc.date.available
2023-01-09T15:14:36Z
dc.date.issued
2022-07
dc.identifier.issn
2640-3498
dc.identifier.uri
http://hdl.handle.net/20.500.11850/591032
dc.identifier.doi
10.3929/ethz-b-000591032
dc.description.abstract
We consider model-based multi-agent reinforcement learning, where the environment transition model is unknown and can only be learned via expensive interactions with the environment. We propose H-MARL (Hallucinated Multi-Agent Reinforcement Learning), a novel sample-efficient algorithm that can efficiently balance exploration, i.e., learning about the environment, and exploitation, i.e., achieve good equilibrium performance in the underlying general-sum Markov game. H-MARL builds high-probability confidence intervals around the unknown transition model and sequentially updates them based on newly observed data. Using these, it constructs an optimistic hallucinated game for the agents for which equilibrium policies are computed at each round. We consider general statistical models (e.g., Gaussian processes, deep ensembles, etc.) and policy classes (e.g., deep neural networks), and theoretically analyze our approach by bounding the agents’ dynamic regret. Moreover, we provide a convergence rate to the equilibria of the underlying Markov game. We demonstrate our approach experimentally on an autonomous driving simulation benchmark. H-MARL learns successful equilibrium policies after a few interactions with the environment and can significantly improve the performance compared to non-optimistic exploration methods.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
PMLR
en_US
dc.rights.uri
http://rightsstatements.org/page/InC-NC/1.0/
dc.title
Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation
en_US
dc.type
Conference Paper
dc.rights.license
In Copyright - Non-Commercial Use Permitted
ethz.book.title
Proceedings of the 39th International Conference on Machine Learning
en_US
ethz.journal.title
Proceedings of Machine Learning Research
ethz.journal.volume
162
en_US
ethz.pages.start
19580
en_US
ethz.pages.end
19597
en_US
ethz.version.deposit
publishedVersion
en_US
ethz.event
39th International Conference on Machine Learning (ICML 2022)
en_US
ethz.event.location
Baltimore, MD, USA
en_US
ethz.event.date
July 17-23, 2022
en_US
ethz.grant
NCCR Automation (phase I)
en_US
ethz.grant
Reliable Data-Driven Decision Making in Cyber-Physical Systems
en_US
ethz.publication.place
Cambridge, MA
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02650 - Institut für Automatik / Automatic Control Laboratory::09578 - Kamgarpour, Maryam (ehemalig) / Kamgarpour, Maryam (former)
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::03908 - Krause, Andreas / Krause, Andreas
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02650 - Institut für Automatik / Automatic Control Laboratory::09578 - Kamgarpour, Maryam (ehemalig) / Kamgarpour, Maryam (former)
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::03908 - Krause, Andreas / Krause, Andreas
en_US
ethz.identifier.url
https://proceedings.mlr.press/v162/sessa22a.html
ethz.grant.agreementno
180545
ethz.grant.agreementno
815943
ethz.grant.fundername
SNF
ethz.grant.fundername
EC
ethz.grant.funderDoi
10.13039/501100001711
ethz.grant.funderDoi
10.13039/501100000780
ethz.grant.program
H2020
ethz.grant.program
NCCR full proposal
ethz.date.deposited
2023-01-09T14:03:02Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2023-01-09T15:14:37Z
ethz.rosetta.lastUpdated
2024-02-02T19:18:19Z
ethz.rosetta.exportRequired
true
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Efficient%20Model-based%20Multi-agent%20Reinforcement%20Learning%20via%20Optimistic%20Equilibrium%20Computation&rft.jtitle=Proceedings%20of%20Machine%20Learning%20Research&rft.date=2022-07&rft.volume=162&rft.spage=19580&rft.epage=19597&rft.issn=2640-3498&rft.au=Sessa,%20Pier%20Giuseppe&Kamgarpour,%20Maryam&Krause,%20Andreas&rft.genre=proceeding&rft.btitle=Proceedings%20of%20the%2039th%20International%20Conference%20on%20Machine%20Learning
Files in this item
Publication type
-
Conference Paper [36200]