Mean field theory of neural networks: From stochastic gradient descent to Wasserstein gradient flows
[Moved Online] Hot Topics: Optimal transport and applications to machine learning and statistics May 04, 2020 - May 08, 2020
Location: SLMath: Online/Virtual
Neural networks
Mean field
Wasserstein gradient flow
Mean Field Theory Of Neural Networks: From Stochastic Gradient Descent To Wasserstein Gradient Flows
Modern neural networks contain millions of parameters, and training them requires to optimize a highly non-convex objective. Despite the apparent complexity of this task, practitioners successfully train such models using simple first order methods such as stochastic gradient descent (SGD). I will survey recent efforts to understand this surprising phenomenon using tools from the theory of partial differential equations. Namely, I will discuss a mean field limit in which the number of neurons becomes large, and the SGD dynamics is approximated by a certain Wasserstein gradient flow. [Joint work with Adel Javanmard, Song Mei, Theodor Misiakiewicz, Marco Mondelli, Phan-Minh Nguyen]
Mean Field Theory Of Neural Networks: From Stochastic Gradient Descent To Wasserstein Gradient Flows
H.264 Video | 928_28405_8336_Mean_Field_Theory_of_Neural_Networks-_From_Stochastic_Gradient_Descent_to_Wasserstein_Gradient_Flows.mp4 |
Please report video problems to itsupport@slmath.org.
See more of our Streaming videos on our main VMath Videos page.