On the Iteration Complexity of Hypergradient Computations

Part of Proceedings of the International Conference on Machine Learning 1 pre-proceedings (ICML 2020)

Bibtex »Metadata »Paper »Supplemental »

Bibtek download is not availble in the pre-proceeding


Authors

Riccardo Grazzi, Luca Franceschi, Massimiliano Pontil, Saverio Salzo

Abstract

<p>We study a general class of bilevel optimization problems, in which the upper-level objective is defined via the solution of a fixed point equation. Important instances arising in machine learning include hyper-parameter optimization, meta-learning, graph and recurrent neural networks. Typically the gradient of the upper-level objective is not known explicitly or is hard to compute exactly, which has raised the interest in approximation methods. We investigate two popular approaches to compute the hypergradient, based on reverse mode iterative differentiation and approximate implicit differentiation. We present a unified analysis which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration complexity. This analysis suggests a hierarchy in terms of computational efficiency among the above methods, with approximate implicit differentiation based on conjugate gradient performing best. We present an extensive experimental comparison among the methods which confirm the theoretical findings.</p>