David Lindner
Loading...
8 results
Filters
Reset filtersSearch Results
Publications 1 - 8 of 8
- GOSAFEOPT: Scalable safe exploration for global optimization of dynamical systemsItem type: Journal Article
Artificial IntelligenceSukhija, Bhavya; Turchetta, Matteo; Lindner, David; et al. (2023)Learning optimal control policies directly on physical systems is challenging. Even a single failure can lead to costly hardware damage. Most existing model-free learning methods that guarantee safety, i.e., no failures, during exploration are limited to local optima. This work proposes GOSAFEOPT as the first provably safe and optimal algorithm that can safely discover globally optimal policies for systems with high-dimensional state space. We demonstrate the superiority of GOSAFEOPT over competing model-free safe learning methods in simulation and hardware experiments on a robot arm. - Mapping out the Space of Human Feedback for Reinforcement Learning: A Conceptual FrameworkItem type: Working Paper
arXivMetz, Yannick; Lindner, David; Baur, Raphaël; et al. (2024)Reinforcement Learning from Human feedback (RLHF) has become a powerful tool to fine-tune or train agentic machine learning models. Similar to how humans interact in social contexts, we can use many types of feedback to communicate our preferences, intentions, and knowledge to an RL agent. However, applications of human feedback in RL are often limited in scope and disregard human factors. In this work, we bridge the gap between machine learning and human-computer interaction efforts by developing a shared understanding of human feedback in interactive learning scenarios. We first introduce a taxonomy of feedback types for reward-based learning from human feedback based on nine key dimensions. Our taxonomy allows for unifying human-centered, interface-centered, and model-centered aspects. In addition, we identify seven quality metrics of human feedback influencing both the human ability to express feedback and the agent’s ability to learn from the feedback. Based on the feedback taxonomy and quality criteria, we derive requirements and design choices for systems learning from human feedback. We relate these requirements and design choices to existing work in interactive machine learning. In the process, we identify gaps in existing work and future research opportunities. We call for interdisciplinary collaboration to harness the full potential of reinforcement learning with data-driven co-adaptive modeling and varied interaction mechanics. - Learning What To Do by Simulating the PastItem type: Conference Paper
International Conference on Learning Representations (ICLR 2021)Lindner, David; Shah, Rohin; Abbeel, Pieter; et al. (2021)Since reward functions are hard to specify, recent work has focused on learning policies from human feedback. However, such approaches are impeded by the expense of acquiring such feedback. Recent work proposed that agents have access to a source of information that is effectively free: in any environment that humans have acted in, the state will already be optimized for human preferences, and thus an agent can extract information about what humans want from the state. Such learning is possible in principle, but requires simulating all possible past trajectories that could have led to the observed state. This is feasible in gridworlds, but how do we scale it to complex tasks? In this work, we show that by combining a learned feature encoder with learned inverse models, we can enable agents to simulate human actions backwards in time to infer what they must have done. The resulting algorithm is able to reproduce a specific skill in MuJoCo environments given a single state sampled from the optimal policy for that skill. - Interactively Learning Preference Constraints in Linear BanditsItem type: Conference Paper
Proceedings of Machine Learning Research ~ Proceedings of the 39th International Conference on Machine LearningLindner, David; Tschiatschek, Sebastian; Hofmann, Katja; et al. (2022)We study sequential decision-making with known rewards and unknown constraints, motivated by situations where the constraints represent expensive-to-evaluate human preferences, such as safe and comfortable driving behavior. We formalize the challenge of interactively learning about these constraints as a novel linear bandit problem which we call constrained linear best-arm identification. To solve this problem, we propose the Adaptive Constraint Learning (ACOL) algorithm. We provide an instance-dependent lower bound for constrained linear best-arm identification and show that ACOL’s sample complexity matches the lower bound in the worst-case. In the average case, ACOL’s sample complexity bound is still significantly tighter than bounds of simpler approaches. In synthetic experiments, ACOL performs on par with an oracle solution and outperforms a range of baselines. As an application, we consider learning constraints to represent human preferences in a driving simulation. ACOL is significantly more sample efficient than alternatives for this application. Further, we find that learning preferences as constraints is more robust to changes in the driving scenario than encoding the preferences directly in the reward function. - RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human FeedbackItem type: Conference Paper
Interactive Learning with Implicit Human Feedback Workshop at ICML 2023Metz, Yannick; Lindner, David; Baur, Raphaël; et al. (2023)To use reinforcement learning from human feedback (RLHF) in practical applications, it is crucial to learn reward models from diverse sources of human feedback and to consider human factors involved in providing feedback of different types. However, the systematic study of learning from diverse types of feedback is held back by limited standardized tooling available to researchers. To bridge this gap, we propose RLHF-Blender, a configurable, interactive interface for learning from human feedback. RLHF-Blender provides a modular experimentation framework and implementation that enables researchers to systematically investigate the properties and qualities of human feedback for reward learning. The system facilitates the exploration of various feedback types, including demonstrations, rankings, comparisons, and natural language instructions, as well as studies considering the impact of human factors on their effectiveness. We discuss a set of concrete research opportunities enabled by RLHF-Blender. More information is available at https://rlhfblender.info/. - Algorithmic Foundations for Safe and Efficient Reinforcement Learning from Human FeedbackItem type: Doctoral ThesisLindner, David (2023)Reinforcement learning (RL) has shown remarkable success in applications with well-defined reward functions, such as maximizing the score in a video game or optimizing an algorithm’s run-time. However, in many real-world applications, there is no well-defined reward function. Instead, Reinforcement Learning from Human Feedback (RLHF) allows RL agents to learn from human-provided data, such as evaluations or rankings of trajectories. In many applications, human feedback is expensive to collect; therefore, learning robust policies from limited data is crucial. In this dissertation, we propose novel algorithms to enhance the sample efficiency and robustness of RLHF. First, we propose active learning algorithms to improve the sample efficiency of RLHF by selecting the most informative data points for the user to label and by exploring the environment guided by uncertainty about the user’s preferences. Our approach provides conceptual clarity about active learning for RLHF and theoretical sample complexity results, drawing inspiration from multi-armed bandits and Bayesian optimization. Moreover, we provide extensive empirical evaluations in simulations that demonstrate the benefit of active learning for RLHF. Second, we extend RLHF to learning constraints from human preferences instead of or in addition to rewards. We argue that constraints are a particularly natural representation of human preferences, particularly in safety-critical applications. We develop algorithms to learn constraints effectively from demonstrations with unknown rewards and actively learn constraints from human feedback. Our results suggest that representing human preferences as constraints can lead to safer policies and extend the potential applications for RLHF. The proposed algorithms for reward and constraint learning serve as a foundation for future research to enhance the efficiency, safety, and applicability of RLHF.
- Challenges for Using Impact Regularizers to Avoid Negative Side EffectsItem type: Conference PaperLindner, David; Matoba, Kyle; Meulemans, Alexander (2022)Designing reward functions for reinforcement learning is difficult: besides specifying which behavior is rewarded for a task, the reward also has to discourage undesired outcomes. Misspecified reward functions can lead to unintended negative side effects, and overall unsafe behavior. To overcome this problem, recent work proposed to augment the specified reward function with an impact regularizer that discourages behavior that has a big impact on the environment. Although initial results with impact regularizers seem promising in mitigating some types of side effects, important challenges remain. In this paper, we examine the main current challenges of impact regularizers and relate them to fundamental design decisions. We discuss in detail which challenges recent approaches address and which remain unsolved. Finally, we explore promising directions to overcome the unsolved challenges in preventing negative side effects with impact regularizers.
- Open Problems and Fundamental Limitations of Reinforcement Learning from Human FeedbackItem type: Journal Article
Transactions on Machine Learning ResearchCasper, Stephen; Davies, Xander; Shi, Claudia; et al. (2023)Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and related methods; (2) overview techniques to understand, improve, and complement RLHF in practice; and (3) propose auditing and disclosure standards to improve societal oversight of RLHF systems. Our work emphasizes the limitations of RLHF and highlights the importance of a multi-faceted approach to the development of safer AI systems.
Publications 1 - 8 of 8