On Thompson Sampling with Langevin Algorithms

Part of Proceedings of the International Conference on Machine Learning 1 pre-proceedings (ICML 2020)

Bibtex »Metadata »Paper »Supplemental »

Bibtek download is not availble in the pre-proceeding


Authors

Eric Mazumdar, Aldo Pacchiano, Yian Ma, Michael Jordan, Peter Bartlett

Abstract

<p>Thompson sampling has been demonstrated both theoretically and empirically to enjoy favorable performance in tackling multi-armed bandits problems. Despite its successes, however, one key obstacle to its use in a much broader range of scenarios is the need for perfect samples from posterior distributions at every iteration, which is oftentimes not feasible in practice. We propose a Markov Chain Monte Carlo (MCMC) method tailored to Thompson sampling to address this issue. We construct a fast converging Langevin algorithm to generate approximate samples with accuracy guarantees. We then leverage novel posterior concentration rates to analyze the statistical risk of the overall Thompson sampling method. Finally, we specify the necessary hyperparameters and the required computational resources for the MCMC procedure to match the optimal risk. The resulting algorithm enjoys both optimal instance-dependent frequentist regret and appealing computation complexity.</p>