Analysis of Gradient Descent on Wide Two-Layer ReLU Neural Networks

[Moved Online] Hot Topics: Optimal transport and applications to machine learning and statistics May 04, 2020 - May 08, 2020

May 08, 2020 (11:00 AM PDT - 12:00 PM PDT)

Speaker(s): Lenaic Chizat (Centre National de la Recherche Scientifique (CNRS))
Location: SLMath: Online/Virtual

Tags/Keywords

Neural networks
Wasserstein gradient flows
generalization
nonnegative measures

Primary Mathematics Subject Classification

80A17 - Thermodynamics of continua [See also 74A15]

Secondary Mathematics Subject Classification

58K40 - Classification; finite determinacy of map germs

Video

Analysis Of Gradient Descent On Wide Two-Layer ReLU Neural Networks

Abstract

In this talk, we propose an analysis of gradient descent on wide two-layer ReLU neural networks that leads to sharp characterizations of the learned predictor and strong generalization performances. The main idea is to study the dynamics when the width of the hidden layer goes to infinity, which is a Wasserstein gradient flow. While this dynamics evolves on a non-convex landscape, we show that its limit is a global minimizer if initialized properly. We also study the "implicit bias" of this algorithm when the objective is the unregularized logistic loss. We finally discuss what these results tell us on the generalization performance. This is based on joint work with Francis Bach.

Supplements

No Notes/Supplements Uploaded

Video/Audio Files

Analysis Of Gradient Descent On Wide Two-Layer ReLU Neural Networks

H.264 Video	928_28397_8335_Analysis_of_Gradient_Descent_on_Wide_Two-Layer_ReLU_Neural_Networks.mp4	Download 1080p (3.78 GB) 720p (2.9 GB) 360p (744 MB)

Troubles with video?

Please report video problems to itsupport@slmath.org.

See more of our Streaming videos on our main VMath Videos page.