Clinician-in-the-Loop Decision Making: Reinforcement Learning with Near-Optimal Set-Valued Policies

Part of Proceedings of the International Conference on Machine Learning 1 pre-proceedings (ICML 2020)

Bibtex »Metadata »Paper »Supplemental »

Bibtek download is not availble in the pre-proceeding


Authors

Shengpu Tang, Aditya Modi, Michael Sjoding, Jenna Wiens

Abstract

<p>Standard reinforcement learning aims to find an optimal policy that identifies the best action for each state. However, in healthcare settings, many actions may be near-equivalent with respect to the reward (e.g., survival). We consider an alternative objective -- learning set-valued policies to capture near-equivalent actions that lead to similar cumulative rewards. We propose a model-free, off-policy algorithm based on temporal difference learning and a near-greedy action selection heuristic. We analyze the theoretical properties of the proposed algorithm, providing optimality guarantees and demonstrate our approach on simulated environments and a real clinical task. Empirically, the proposed algorithm exhibits reasonably good convergence properties and discovers meaningful near-equivalent actions. Our work provides theoretical, as well as practical, foundations for clinician-in-the-loop decision making, in which clinicians/patients can incorporate additional knowledge (e.g., side effects and patient preference) to distinguish among near-equivalent actions. </p>