Bibtek download is not availble in the pre-proceeding
Tianyu Wang, Cynthia Rudin
We study the bandit problem where the underlying expected reward is a Bounded Mean Oscillation (BMO) function. BMO functions are allowed to be discontinuous and unbounded, and are useful in modeling signals with singularities in the domain. For example, BMO functions can model the intensity field of several radioactive emitting sources. A bandit BMO algorithm can help us quickly locate the strongest emitting source. We develop a toolset for BMO bandits, and provide an algorithm that can achieve poly-log $\delta$-regret -- a regret measured against an arm that is optimal after removing a $\delta$-sized portion of the arm space.