Predictions, Policies, Rewards: Models of Decision-Making from Observational Data
OPEN ACCESS
Loading...
Author / Producer
Date
2025
Publication Type
Doctoral Thesis
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
While reinforcement learning has achieved success in solving well-defined decision-making problems, its application to optimizing complex human decisions remains a challenge. A promising use case would be in healthcare, where data-driven models could support the process of diagnosis or treatment. Modeling such decision-making problems is difficult due to the inherent complexity of real-world data, the high stakes of each decision's potential outcomes, and the ill-defined objectives of the tasks considered. In this thesis, we formalize and address these challenges to learning and optimizing models of decision-making. We structure our focus around three interdependent modeling paradigms: prediction, policy, and reward models.
First, we propose to improve prediction models of real-world environments, specifically focusing on patient trajectories in electronic health records. Deep learning architectures still perform poorly on clinical time-series data, due to high variation across feature types and sampling rates. We address these issues by leveraging the semantic heterogeneity and temporal structure of the data. This results in novel model architectures and objective functions that improve the performance of predictive models for this data modality.
Next, we explore how to derive policy models, describing what action to take in a given situation. The major challenge is to learn without direct environment interaction. Offline reinforcement learning and imitation learning are two frameworks for learning decision policies from observational data. We leverage these to obtain actionable policies that could be deployed for decision support – prioritizing reliability and interpretability. To ensure end-user adoption, effective policy models for such high-stakes applications must be robust to causal biases present in the data, and transparent in explaining the decision-making process. We design methods to achieve this and validate them on real and simulated medical tasks.
Finally, we consider the task of designing reward functions aligned with human objectives. In healthcare, desirable outcomes could represent patient survival, quality-adjusted life years, or the prevention of specific adverse events. Rather than manually formalizing such complex, multifaceted objectives, we focus on learning reward models based on human feedback. As this data may be expensive to collect, we develop methods that maximize the sample efficiency of the learning process by generating simulated trajectories and synthetic preferences – always in a fully observational setting. Our approach allows for general and scalable applications, including reward learning for language model alignment.
Motivated by healthcare but broadly applicable across domains, this thesis addresses fundamental challenges in learning models of human decision-making. It takes a step towards advancing the development of safe and effective decision support systems, helping to bridge the gap between machine learning research and real-world impact.
Permanent link
Publication status
published
External links
Editor
Contributors
Examiner: Rätsch, Gunnar
Examiner : Schölkopf, Bernhard
Examiner : Ramponi, Giorgia
Examiner : Martius, Georg
Book title
Journal / series
Volume
Pages / Article No.
Publisher
ETH Zurich
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Machine Learning; Reinforcement Learning; Causal Inference; Machine Learning for Healthcare; Reinforcement Learning from Human Feedback
Organisational unit
09664 - Schölkopf, Bernhard / Schölkopf, Bernhard
09568 - Rätsch, Gunnar / Rätsch, Gunnar