Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics

Part of Proceedings of the International Conference on Machine Learning 1 pre-proceedings (ICML 2020)

Bibtex »Metadata »Paper »Supplemental »

Bibtek download is not availble in the pre-proceeding


Authors

Arsenii Kuznetsov, Pavel Shvechikov, Alexander Grishin, Dmitry Vetrov

Abstract

<p>According to previous studies, one of the major impediments to accurate off-policy learning is the overestimation bias. This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting. Our method---Truncated Quantile Critics, TQC,---blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics. We show that all components are key for the achieved performance. Distributional representation combined with truncation allows for arbitrary granular overestimation control, and ensembling further improves the results of our method. TQC significantly outperforms the current state of the art on all environments from the continuous control benchmark suite, demonstrating 25% improvement on the most challenging Humanoid environment.</p>