Momentum-Based Policy Gradient Methods

Part of Proceedings of the International Conference on Machine Learning 1 pre-proceedings (ICML 2020)

Bibtex »Metadata »Paper »Supplemental »

Bibtek download is not availble in the pre-proceeding


Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang


Policy gradient methods are a class of powerful algorithms in reinforcement learning (RL). More recently, some variance reduced policy gradient methods have been developed to improve sample efficiency and obtain a near-optimal sample complexity $O(\epsilon^{-3})$ for finding an $\epsilon$-stationary point of non-concave performance function in model-free RL. However, the practical performances of these variance reduced policy gradient methods are not consistent with their near-optimal sample complexity, because these methods require large batches and strict learning rates to achieve this optimal complexity. In the paper, thus, we propose a class of efficient momentum-based policy gradient methods, which use adaptive learning rates and do not require large batches. Specifically, we propose a fast important-sampling momentum-based policy gradient (IS-MBPG) method by using the important sampling technique. Meanwhile, we also propose a fast hessian-aided momentum-based policy gradient (HA-MBPG) method via using the semi-hessian information. In theoretical analysis, we prove that our algorithms also have the sample complexity $O(\epsilon^{-3})$, as the existing best policy gradient methods. In the experiments, we use some benchmark tasks to demonstrate the effectiveness of algorithms.