Explicit construction of zero loss minimizers and the interpretability problem in Deep Learning

Kinetic Theory: Novel Statistical, Stochastic and Analytical Methods October 20, 2025 - October 24, 2025

October 21, 2025 (01:30 PM PDT - 02:30 PM PDT)

Speaker(s): Thomas Chen (University of Texas at Austin)
Location: SLMath: Eisenbud Auditorium, Online/Virtual

Tags/Keywords

deep learning
gradient flow
effective equations
generalization bounds

Primary Mathematics Subject Classification

57R70 - Critical points and critical submanifolds in differential topology

Secondary Mathematics Subject Classification

62M45 - Neural nets and related approaches to inference from stochastic processes

Video

Explicit construction of zero loss minimizers and the interpretability problem in Deep Learning

Abstract

In this talk, we present some recent results aimed at the rigorous mathematical understanding of how and why supervised learning works. We point out genericness conditions related to reachability of zero loss minimization, in underparametrized versus overparametrized Deep Learning (DL) networks. For underparametrized DL networks, we explicitly construct global, zero loss cost minimizers for sufficiently clustered data. In addition, we derive effective equations governing the cumulative biases and weights, and show that gradient descent corresponds to a dynamical process in the input layer, whereby clusters of data are progressively reduced in complexity ("truncated") at an exponential rate that increases with the number of data points that have already been truncated. For overparametrized DL networks, we prove that the gradient descent flow is homotopy equivalent to a geometrically adapted flow that induces a (constrained) Euclidean gradient flow in output space. If a certain rank condition holds, the latter is, upon reparametrization of the time variable, equivalent to simple linear interpolation. This in turn implies zero loss minimization and the phenomenon known as “Neural Collapse”. Moreover, we derive zero loss guarantees, and construct explicit global minimizers for overparametrized deep networks, given generic training data. This is applied to derive deterministic generalization bounds that depend on the geometry of the training and test data, but not on the network architecture. The work presented includes collaborations with Patricia Munoz Ewald, Andrew G. Moore, and C.-K. Kevin Chien.

Supplements

No Notes/Supplements Uploaded

Video/Audio Files

Explicit construction of zero loss minimizers and the interpretability problem in Deep Learning

Troubles with video?

Please report video problems to itsupport@slmath.org.

See more of our Streaming videos on our main VMath Videos page.