Training Linear Neural Networks: Non-Local Convergence and Complexity Results

Part of Proceedings of the International Conference on Machine Learning 1 pre-proceedings (ICML 2020)

Bibtex »Metadata »Paper »Supplemental »

Bibtek download is not availble in the pre-proceeding


Armin Eftekhari


<p>Linear networks provide valuable insight into the workings of neural networks in general.</p> <p>In this paper, we improve the state of the art in (Bah et al., 2019) by identifying conditions under which gradient flow successfully trains a linear network, in spite of the non-strict saddle points present in the optimization landscape.</p> <p>We also improve the state of the art for computational complexity of training linear networks in (Arora et al., 2018a) by establishing non-local linear convergence rates for gradient flow.</p> <p>Crucially, these new results are not in the lazy training regime, cautioned against in (Chizat et al., 2019; Yehudai &amp; Shamir, 2019).</p> <p>Our results require the network to have a layer with one neuron, which corresponds to the popular spiked covariance model in statistics, and subsumes the important case of networks with a scalar output. Extending these results to all linear networks remains an open problem.</p>