Planning under model uncertainty is a fundamental problem across many
applications of decision making and learning. In this paper, we propose the
Robust Adaptive Monte Carlo Planning (RAMCP) algorithm, which allows
computation of risk-sensitive Bayes-adaptive policies that optimally trade off
exploration, exploitation, and robustness. RAMCP formulates the risk-sensitive
planning problem as a two-player zero-sum game, in which an adversary perturbs
the agent’s belief over the models. We introduce two versions of the RAMCP
algorithm. The first, RAMCP-F, converges to an optimal risk-sensitive policy
without having to rebuild the search tree as the underlying belief over models
is perturbed. The second version, RAMCP-I, improves computational efficiency at
the cost of losing theoretical guarantees, but is shown to yield empirical
results comparable to RAMCP-F. RAMCP is demonstrated on an n-pull multi-armed
bandit problem, as well as a patient treatment scenario.

Source link