A simpler approach to accelerated optimization: iterative averaging meets optimism

Part of Proceedings of the International Conference on Machine Learning 1 pre-proceedings (ICML 2020)

Bibtex »Metadata »Paper »Supplemental »

Bibtek download is not availble in the pre-proceeding


Authors

Pooria Joulani, Anant Raj, András György, Csaba Szepesvari

Abstract

<p>Recently there have been several attempts to improve the rates of convergence achievable for smooth objectives. In particular, several recent papers have attempted to extend Nesterov's accelerated algorithm to stochastic and variance-reduced optimization. In this paper, we show that there is a simpler approach to obtaining accelerated rates: applying generic, well-known optimistic online learning algorithms and using the online average of their predictions to query the (deterministic or stochastic) first-order optimization oracle at each time step. In particular, we tighten the recent results of Cutkosky (2019) to demonstrate theoretically that online averaging results in a reduced optimization gap, independently of the algorithm involved. Then, we show that a simple tuning of existing generic optimistic online learning algorithms (e.g., Joulani et al [2017]), when combined with the reduced error quantified above, naturally results in optimal accelerated rates. \todoc{what would you consider unnatural? get rid of this word?} Importantly, the smooth objective may or may not be strongly-convex, and the rates are nevertheless optimal for both stochastic and deterministic first-order oracles. We further show that the same ideas transfer to variance-reduced optimization. In each case, the proofs are much simpler than the previous work, such as the new derivations of accelerated algorithms based on a primal-dual view (Wang and Abernethy, 2018) or the ideas based on linear coupling (Allen-Zhu and Orecchia, 2017). Importantly, we also provide algorithms that maintain the ``universality'' property, meaning that the same algorithm achieves the optimal rate for smooth and non-smooth objectives without further prior knowledge, generalizing the results of Kavis et al (2019) and solving a number of their open problems.</p>