"Paper Presentation: Gradient Descent Provably Optimizes Over-parameterized Neural Networks"Berner, JuliusPresentation of a paper by Simon S. Du, Xiyu Zhai, Barnabas Poczos, and Aarti Singh entitled"Gradient Descent Provably Optimizes Over-parameterized Neural Networks"
Abstract:
One of the mysteries in the success of neural networks is randomly initialized first order methods like gradient descent can achieve zero training loss even though the objective function is non-convex and non-smooth. This paper demystifies this surprising phenomenon for two-layer fully connected ReLU activated neural networks. For an m hidden node shallow neural network with ReLU activation and n training data, we show as long as m is large enough and no two inputs are parallel, randomly initialized gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. |
« back