Information-Directed Sampling - Frequentist Analysis and Applications

Kirschner, Johannes

doi:10.3929/ethz-b-000494381

Download

Full text (PDF, 13.05Mb)

Open access

Autor(in)

Kirschner, Johannes

Datum

2021

Typ

Doctoral Thesis

ETH Bibliographie

yes

Altmetrics

Download

Full text (PDF, 13.05Mb)

Rechte / Lizenz

In Copyright - Non-Commercial Use Permitted

Abstract

Sequential decision-making is an iterative process between a learning agent and an environment. We study the stochastic setting, where the learner chooses an action in each round and the environment returns a noisy feedback signal. The learner's objective is to maximize a reward function that depends on the chosen actions. This basic model has many applications, including adaptive experimental design, product recommendations, dynamic pricing and black-box optimization. The combination of statistical uncertainty and the objective to maximize reward creates a tension between exploration and exploitation: The learner has to carefully balance between actions that provide informative feedback and actions estimated to yield a high reward. The fields of bandit algorithms and partial monitoring study methods to resolve the exploration-exploitation trade-off optimally, using various regularity assumptions on the feedback-reward structure. Two of the most widely used methods are optimistic algorithms and Thompson sampling, which have been successfully applied in numerous settings and come with strong theoretical guarantees. More recently, however, an increasing amount of evidence shows that optimism and Thompson sampling are not universal exploration principles. In structured models with correlated feedback, clearly suboptimal actions sometimes provide informative feedback that outweighs their cost. Meanwhile, optimistic approaches and Thompson sampling discard such actions early on, which leads to inefficient exploration. An alternative and less studied design principle is information-directed sampling (IDS), originally proposed in the Bayesian setting. The main contribution in this thesis is a frequentist interpretation of the IDS framework, complemented with frequentist performance guarantees for several settings with linear reward and feedback structure. Using the IDS approach, we resolve the long-standing challenge to find an asymptotically instance-optimal algorithm for linear bandits that is simultaneously minimax optimal. We further extend the IDS approach to the more general linear partial monitoring setting, making the method applicable to a vast range of previously studied models for sequential decision-making. Along the way, we develop the theory of information-directed sampling, uncover a connection to primal-dual methods and propose computationally faster approximations. Lastly, we discuss extensions of the IDS framework to contextual decision-making and the kernelized setting and highlight example applications. Mehr anzeigen

Persistenter Link

https://doi.org/10.3929/ethz-b-000494381

Publikationsstatus

published

Externe Links

Printexemplar via ETH-Bibliothek suchen

Beteiligte

Referent: Krause, Andreas
Referent: Van Roy, Benjamin
Referent: Szepesvári, Csaba

Verlag

ETH Zurich

Thema

Sequential Decision Making; Partial monitoring; Information-Directed Sampling

Organisationseinheit

03908 - Krause, Andreas / Krause, Andreas

Mehr

Alle Metadaten anzeigen

ETH Bibliographie

yes

Altmetrics

Research Collection

Suche

Information-Directed Sampling - Frequentist Analysis and Applications Mendeley CSV RIS BibTeX

Information-Directed Sampling - Frequentist Analysis and Applications

Mendeley

CSV

RIS

BibTeX