Stochastic multi-armed bandits: optimality, complexity, robustness, and risk-sensitivity
Loading...
Authors
Mukherjee, Arpan
Issue Date
2024-12
Type
Electronic thesis
Thesis
Thesis
Language
en_US
Keywords
Electrical engineering
Alternative Title
Abstract
Multi-armed bandit problems serve as fundamental models in sequential experimental design, where the goal is to balance exploration and exploitation to maximize cumulative rewards. These problems are pivotal in various real-world applications, including clinical trials, online advertising, and recommendation systems. In the realm of multi-armed bandits, two primary paradigms have emerged: regret minimization, where the objective is to minimize the difference between the rewards obtained and those that would have been obtained by always choosing the best arm, and best arm identification (BAI), aiming to accurately identify the arm with the highest reward mean. This dissertation delves into both paradigms, addressing some unresolved questions spanning across {\em optimal} and {\em efficient} experiment design. Furthermore, the viewpoint of {\em decision safety} and {\em risk sensitivity} is adopted, addressing some of the practical challenges in sequential experimental design. In the context of BAI, existing optimal algorithms have been computationally expensive. These algorithms aim to compute an optimal allocation over the arms by solving a minimax optimization problem in each round, introducing significant computational challenges. Conversely, existing efficient algorithms for BAI have not been optimal. These algorithms allocate a pre-chosen fraction of samples to the best arm, denoted by a tunable parameter $\beta$, and their optimality depends on selecting an appropriate value of $\beta$, creating a gap between efficiency and optimality. This dissertation addresses this gap by introducing algorithms that are both optimal and computationally efficient. The efficiency is attributed to {\em implicitly} estimating the optimal value of $\beta$, without having to solve any optimization problem. Furthermore, while existing literature predominantly concentrates on the exponential distribution family, this approach extends its scope to encompass a broader range of distributions parameterized by their mean values, subject to mild regularity conditions. As the second focus, this dissertation delves into the aspect of {\em decision safety} in stochastic bandits. In applications such as recommender systems, user feedback is often malicious, and in clinical trials, experiment results are susceptible to human errors. These fluctuations in the reward is modeled through an oblivious adversary which is capable of replacing the reward by adversarial samples for a fraction of rounds, also known as the Huber's contamination model. Viewing the problem from a robustness perspective, this dissertation introduces two algorithms: a gap-based algorithm and a successive elimination-based algorithm, Performance guarantees of these algorithms show that these are robust to adversarial contamination, and may achieve optimal sample complexity (up to constant factors). Hence, this dissertation takes a step towards decision safety in practical applications involving experimental design. As the third focus, this dissertation proposes to investigate human risk-taking behavior in decision-making contexts. While most existing algorithms aim to maximize the average reward, this is not always the objective in many human-centric applications. For instance, in high-frequency trading, decision-making revolves around the principle of high-risk and high-reward, in contrast to maximizing the average reward over a period of time. Designing experiments in such risk-sensitive settings involve significantly distinct approaches compared to its canonical risk-neutral counterpart. This dissertation proposes a framework for risk-sensitive bandits and lays down theoretical context and real-world applications in which decision safety is critical to the algorithm design.
Description
December 2024
School of Engineering
School of Engineering
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY