Predictions, Policies, Rewards: Models of Decision-Making from Observational Data


Loading...

Author / Producer

Date

2025

Publication Type

Doctoral Thesis

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

While reinforcement learning has achieved success in solving well-defined decision-making problems, its application to optimizing complex human decisions remains a challenge. A promising use case would be in healthcare, where data-driven models could support the process of diagnosis or treatment. Modeling such decision-making problems is difficult due to the inherent complexity of real-world data, the high stakes of each decision's potential outcomes, and the ill-defined objectives of the tasks considered. In this thesis, we formalize and address these challenges to learning and optimizing models of decision-making. We structure our focus around three interdependent modeling paradigms: prediction, policy, and reward models. First, we propose to improve prediction models of real-world environments, specifically focusing on patient trajectories in electronic health records. Deep learning architectures still perform poorly on clinical time-series data, due to high variation across feature types and sampling rates. We address these issues by leveraging the semantic heterogeneity and temporal structure of the data. This results in novel model architectures and objective functions that improve the performance of predictive models for this data modality. Next, we explore how to derive policy models, describing what action to take in a given situation. The major challenge is to learn without direct environment interaction. Offline reinforcement learning and imitation learning are two frameworks for learning decision policies from observational data. We leverage these to obtain actionable policies that could be deployed for decision support – prioritizing reliability and interpretability. To ensure end-user adoption, effective policy models for such high-stakes applications must be robust to causal biases present in the data, and transparent in explaining the decision-making process. We design methods to achieve this and validate them on real and simulated medical tasks. Finally, we consider the task of designing reward functions aligned with human objectives. In healthcare, desirable outcomes could represent patient survival, quality-adjusted life years, or the prevention of specific adverse events. Rather than manually formalizing such complex, multifaceted objectives, we focus on learning reward models based on human feedback. As this data may be expensive to collect, we develop methods that maximize the sample efficiency of the learning process by generating simulated trajectories and synthetic preferences – always in a fully observational setting. Our approach allows for general and scalable applications, including reward learning for language model alignment. Motivated by healthcare but broadly applicable across domains, this thesis addresses fundamental challenges in learning models of human decision-making. It takes a step towards advancing the development of safe and effective decision support systems, helping to bridge the gap between machine learning research and real-world impact.

Publication status

published

Editor

Contributors

Examiner: Rätsch, Gunnar
Examiner : Schölkopf, Bernhard
Examiner : Ramponi, Giorgia
Examiner : Martius, Georg

Book title

Journal / series

Volume

Pages / Article No.

Publisher

ETH Zurich

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Machine Learning; Reinforcement Learning; Causal Inference; Machine Learning for Healthcare; Reinforcement Learning from Human Feedback

Organisational unit

09664 - Schölkopf, Bernhard / Schölkopf, Bernhard check_circle
09568 - Rätsch, Gunnar / Rätsch, Gunnar

Notes

Funding

Related publications and datasets