"Dynamics and Convergence of Weight Normalization for Training Neural Networks"
Montufar, GuidoWe present a result on the convergence of weight normalized
training of artificial neural networks. In the analysis, we consider
over-parameterized 2-layer networks with rectified linear units
(ReLUs) initialized at random and trained with batch gradient descent
and a fixed step size. The proof builds on recent theoretical works
that bound the trajectory of parameters from their initialization and
monitor the network predictions via the evolution of a ''neural
tangent kernel'' (Jacot et al. 2018). We discover that training with
weight normalization decomposes such a kernel via the so called
''length-direction decoupling''. This in turn leads to two convergence
regimes. From the modified convergence we make a few curious
observations including a natural form of ''lazy training'' where the
direction of each weight vector remains stationary.
This is joint work with Yonatan Dukler and Quanquang Gu