Algorithmic Foundations for Safe and Efficient Reinforcement Learning from Human Feedback
dc.contributor.author
Lindner, David
dc.contributor.supervisor
Krause, Andreas
dc.contributor.supervisor
Hofmann, Katja
dc.contributor.supervisor
Sadigh, Dorsa
dc.date.accessioned
2023-10-06T10:15:02Z
dc.date.available
2023-10-05T20:48:08Z
dc.date.available
2023-10-05T21:10:23Z
dc.date.available
2023-10-06T06:22:45Z
dc.date.available
2023-10-06T10:15:02Z
dc.date.issued
2023
dc.identifier.uri
http://hdl.handle.net/20.500.11850/635156
dc.identifier.doi
10.3929/ethz-b-000635156
dc.description.abstract
Reinforcement learning (RL) has shown remarkable success in applications with well-defined reward functions, such as maximizing the score in a video game or optimizing an algorithm’s run-time. However, in many real-world applications, there is no well-defined reward function. Instead, Reinforcement Learning from Human Feedback (RLHF) allows RL agents to learn from human-provided data, such as evaluations or rankings of trajectories. In many applications, human feedback is expensive to collect; therefore, learning robust policies from limited data is crucial. In this dissertation, we propose novel algorithms to enhance the sample efficiency and robustness of RLHF.
First, we propose active learning algorithms to improve the sample efficiency of RLHF by selecting the most informative data points for the user to label and by exploring the environment guided by uncertainty about the user’s preferences. Our approach provides conceptual clarity about active learning for RLHF and theoretical sample complexity results, drawing inspiration from multi-armed bandits and Bayesian optimization. Moreover, we provide extensive empirical evaluations in simulations that demonstrate the benefit of active learning for RLHF.
Second, we extend RLHF to learning constraints from human preferences instead of or in addition to rewards. We argue that constraints are a particularly natural representation of human preferences, particularly in safety-critical applications. We develop algorithms to learn constraints effectively from demonstrations with unknown rewards and actively learn constraints from human feedback. Our results suggest that representing human preferences as constraints can lead to safer policies and extend the potential applications for RLHF.
The proposed algorithms for reward and constraint learning serve as a foundation for future research to enhance the efficiency, safety, and applicability of RLHF.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
ETH Zurich
en_US
dc.rights.uri
http://creativecommons.org/licenses/by/4.0/
dc.subject
reinforcement learning
en_US
dc.subject
Inverse reinforcement learning
en_US
dc.subject
preference learning
en_US
dc.subject
reinforcement learning from human feedback
en_US
dc.title
Algorithmic Foundations for Safe and Efficient Reinforcement Learning from Human Feedback
en_US
dc.type
Doctoral Thesis
dc.rights.license
Creative Commons Attribution 4.0 International
dc.date.published
2023-10-06
ethz.size
258 p.
en_US
ethz.code.ddc
DDC - DDC::0 - Computer science, information & general works::004 - Data processing, computer science
en_US
ethz.identifier.diss
29577
en_US
ethz.publication.place
Zurich
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::03908 - Krause, Andreas / Krause, Andreas
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::03908 - Krause, Andreas / Krause, Andreas
en_US
ethz.date.deposited
2023-10-05T20:48:08Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2023-10-06T10:15:03Z
ethz.rosetta.lastUpdated
2024-02-03T04:31:13Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Algorithmic%20Foundations%20for%20Safe%20and%20Efficient%20Reinforcement%20Learning%20from%20Human%20Feedback&rft.date=2023&rft.au=Lindner,%20David&rft.genre=unknown&rft.btitle=Algorithmic%20Foundations%20for%20Safe%20and%20Efficient%20Reinforcement%20Learning%20from%20Human%20Feedback
Dateien zu diesem Eintrag
Publikationstyp
-
Doctoral Thesis [30092]