May 04, 2020
Monday
|
09:15 AM - 09:30 AM
|
|
Welcome
David Eisenbud (University of California, Berkeley)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
- --
- Supplements
-
--
|
09:30 AM - 10:30 AM
|
|
Scaling Optimal Transport for High dimensional Learning
Gabriel Peyré (École Normale Supérieure)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
Optimal transport (OT) has recently gained lot of interest in machine learning. It is a natural tool to compare in a geometrically faithful way probability distributions. It finds applications in both supervised learning (using geometric loss functions) and unsupervised learning (to perform generative model fitting). OT is however plagued by the curse of dimensionality, since it might require a number of samples which grows exponentially with the dimension. In this talk, I will review entropic regularization methods which define geometric loss functions approximating OT with a better sample complexity. More information and references can be found on the website of our book "Computational Optimal Transport" https://optimaltransport.github.io/
- Supplements
-
--
|
10:30 AM - 11:00 AM
|
|
Break
|
- Location
- --
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
11:00 AM - 12:00 PM
|
|
Linear Unbalanced Optimal Transport
Matthew Thorpe (University of Manchester)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
Optimal transport is a powerful tool for measuring the distances between signals. However, the most common choice is to use the Wasserstein distance where one is required to treat the signal as a probability measure. This places restrictive conditions on the signals and although ad-hoc renormalisation can be applied to sets of unnormalised measures this can often dampen features of the signal. The second disadvantage is that despite recent advances, computing optimal transport distances for large sets is still difficult. In this talk I will focus on the Hellinger-Kantorovich distance, which can be applied between any pair of non-negative measures. I will describe how the distance can be linearised and embedded into a Euclidean space (the analogue of the linear optimal transport framework for Hellinger-Kantorovich). The Euclidean distance in the embedded space is approximately the Wasserstein distance in the original space. This method, in particular, allows for the application of off-the-shelf data analysis tools such as principal component analysis.
This is joint work with Bernhard Schmitzer (TU Munich).
- Supplements
-
--
|
12:00 PM - 02:00 PM
|
|
Break
|
- Location
- SLMath: Online/Virtual
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
02:00 PM - 03:00 PM
|
|
Computing Wasserstein barycenters using gradient descent algorithms
Philippe Rigollet (Massachusetts Institute of Technology)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
In this talk, I will present rates of convergence for Wasserstein barycenters using gradient descent and stochastic gradient descent. While the barycenter functional is not geodesically convex, this result hinges on a Polyak-Lojasiewicz (PL) inequality in the case where the underlying distribution is supported on a subset of Gaussian distributions.
- Supplements
-
Notes
12.7 MB application/pdf
|
|
|
May 05, 2020
Tuesday
|
09:30 AM - 10:30 AM
|
|
Kalman-Wasserstein Gradient Flows
Franca Hoffman (California Institute of Technology)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
We study a class of interacting particle systems that may be used for optimization. By considering the mean-field limit one obtains a nonlinear Fokker-Planck equation. This equation exhibits a gradient structure in probability space, based on a modified Wasserstein distance which reflects particle correlations: the Kalman-Wasserstein metric. This setting gives rise to a methodology for calibrating and quantifying uncertainty for parameters appearing in complex computer models which are expensive to run, and cannot readily be differentiated. This is achieved by connecting the interacting particle system to ensemble Kalman methods for inverse problems. This is joint work with Alfredo Garbuno-Inigo (Caltech), Wuchen Li (UCLA) and Andrew Stuart (Caltech).
- Supplements
-
--
|
10:30 AM - 11:00 AM
|
|
Break
|
- Location
- --
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
11:00 AM - 12:00 PM
|
|
A Deeper Understanding of the Quadratic Wasserstein Metric in Inverse Data Matching
Yunan Yang (New York University, Courant Institute)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
We provide analytical and computational characterizations on the general inverse data matching problems based on the quadratic Wasserstein distance under both deterministic and Bayesian approaches. We show that the quadratic Wasserstein metric has a "smoothing" effect on the inversion process, making it very robust against high-frequency noise in the data but leading to a reduced resolution for the reconstructed objects at a given noise level.
- Supplements
-
Notes
4.33 MB application/pdf
|
|
12:00 PM - 02:00 PM
|
|
Break
|
- Location
- SLMath: Online/Virtual
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
02:00 PM - 03:00 PM
|
|
A Machine Learning Framework for Solving High-Dimensional Mean Field Game and Mean Field Control Problems
Samy Wu Fung (University of California, Los Angeles)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
Mean field games (MFG) and mean field control (MFC) are critical classes of multi-agent models for the efficient analysis of massive populations of interacting agents. Their areas of application span topics in economics, finance, game theory, industrial engineering, crowd motion, and more. In this paper, we provide a flexible machine learning framework for the numerical solution of potential MFG and MFC models. State-of-the-art numerical methods for solving such problems utilize spatial discretization that leads to a curse of dimensionality. We approximately solve high-dimensional problems by combining Lagrangian and Eulerian viewpoints and leveraging recent advances from machine learning. More precisely, we work with a Lagrangian formulation of the problem and enforce the underlying Hamilton–Jacobi–Bellman (HJB) equation that is derived from the Eulerian formulation. Finally, a tailored neural network parameterization of the MFG/MFC solution helps us avoid any spatial discretization. Our numerical results include the approximate solution of 100-dimensional instances of optimal transport and crowd motion problems on a standard work station and a validation using a Eulerian solver in two dimensions. These results open the door to much-anticipated applications of MFG and MFC models that are beyond reach with existing numerical methods.
- Supplements
-
--
|
|
May 06, 2020
Wednesday
|
09:30 AM - 10:30 AM
|
|
From quantization of measures to weighted ultrafast diffusion equations
Mikaela Iacobelli (ETH Zurich)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
In this talk I will discuss some recent results on the asymptotic behaviour of a family of weighted ultrafast diffusion PDEs. These equations are motivated by the gradient flow approach to the problem of quantization of measures, introduced in a series of joint papers with Emanuele Caglioti and François Golse. In this presentation I will focus on a result obtained in collaboration with Francesco Saverio Patacchini and Filippo Santambrogio, where we use the JKO scheme to obtain existence, uniqueness, and exponential convergence to equilibrium under minimal assumptions on the data.
- Supplements
-
--
|
10:30 AM - 11:00 AM
|
|
Break
|
- Location
- --
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
11:00 AM - 12:00 PM
|
|
Equality of the Jellium and Uniform Electron Gas next-order asymptotic terms for Coulomb and Riesz potentials
Codina Cotar (University College London)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
We consider two sharp next-order asymptotics problems, namely the asymptotics for the minimum energy for optimal point configurations and the asymptotics for the many-marginals Optimal Transport, in both cases with Riesz costs with inverse power-law long range interactions. The first problem describes the ground state of a Coulomb or Riesz gas, while the second appears as a semiclassical limit of the Density Functional Theory energy modelling a quantum version of the same system. Recently the second-order term in these expansions was precisely described, and corresponds respectively to a Jellium and to a Uniform Electron Gas model. The present work shows that for inverse-power-law interactions with power s∈[d−2,d) in d dimensions, the two problems have the same minimum value asymptotically. For the Coulomb case in d=3, our result verifies the physicists' long-standing conjecture regarding the equality of the second-order terms for these two problems. Furthermore, our work implies that, whereas minimum values are equal, the minimizers may be different. Moreover, provided that the crystallization hypothesis in d=3 holds, which is analogous to Abrikosov's conjecture in d=2, then our result verifies the physicists' conjectured ≈1.4442 lower bound on the famous Lieb-Oxford constant. Our work also rigorously confirms some of the predictions formulated by physicists, regarding the optimal value of the Uniform Electron Gas second-order asymptotic term. We also show that on the whole range s∈(0,d), the Uniform Electron Gas second-order constant is continuous in s.
- Supplements
-
Notes
590 KB application/pdf
|
|
|
May 07, 2020
Thursday
|
09:30 AM - 10:30 AM
|
|
Regularity theory and uniform convergence in the large data limit of graph Laplacian eigenvectors on random data clouds
Nicolas Garcia Trillos (University of Wisconsin-Madison)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
Graph Laplacians are omnipresent objects in machine learning that have been used in supervised, unsupervised and semi supervised settings due to their versatility in extracting local and global geometric information from data clouds. In this talk I will present an overview of how the mathematical theory built around them has gotten deeper and deeper, layer by layer, since the appearance of the first results on pointwise consistency in the 2000’s, until the most recent developments; this line of research has found strong connections between PDEs built on proximity graphs on data clouds and PDEs on manifolds, and has given a more precise mathematical meaning to the task of “manifold learning”. In the first part of the talk I will highlight how ideas from optimal transport made some of the initial steps, which provided L2 type error estimates between the spectra of graph Laplacians and Laplace Beltrami operators, possible. In the second part of the talk, which is based on recent work with Jeff Calder and Marta Lewicka, I will present a newly developed regularity theory for graph Laplacians which among other things allow us to bootstrap the L2 error estimates developed through optimal transport and upgrade them to uniform convergence and almost C^{0,1} convergence rates. The talk can be seen as a tale of how a flow of ideas from optimal transport, PDEs, and in general, analysis, has made possible a finer understanding of concrete objects that have been popular in data analysis and machine learning.
- Supplements
-
Notes
3.49 MB application/pdf
|
|
10:30 AM - 11:00 AM
|
|
Break
|
- Location
- --
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
11:00 AM - 12:00 PM
|
|
From an ODE to accelerated stochastic gradient descent: convergence rate and empirical results
Adam Oberman
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
Su, Boyd and Candés showed that Nesterov's method for accelerated gradient descent can be obtained as the discretization of an Ordinary Differential Equation (ODE). By discretizing a modified ODE we show that : (i) we recover Nesterov's method for convex functions using a constant learning rate, (ii) choosing a decreasing learning rate leads to an accelerated stochastic gradient descent algorithm. The learning rate for the stochastic algorithm is of order k^(3/4), which is novel. We also study a second ODE and obtain a corresponding algorithm for the strongly convex case.
Convergence rates for the algorithms are established using a Lyapunov function approach which unifies the continuous time, deterministic gradient, and stochastic gradient cases. The stochastic algorithms are accelerated (in the convex case, and stochastic approximation setting), in the sense that the rate constant improves over existing results. Existing results have a rate constant which depends on G^2, the bound on the norm of the stochastic gradient. The accelerated algorithm rate constant depends on the smaller constant depends on \sigma^2, the variance of the stochastic gradient.
Numerical results in both the stochastic approximation and the finite sum setting, demonstrate that the accelerated algorithms can converge faster than SGD, or momentum SGD algorithms. Moreover, the algorithms are easily tuned, in the sense that empirical results are robust to changes in the learning rate schedule and constants. This is joint work with Maxime Laborde.
- Supplements
-
Notes
4.64 MB application/pdf
|
|
12:00 PM - 02:00 PM
|
|
Break
|
- Location
- SLMath: Online/Virtual
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
02:00 PM - 03:00 PM
|
|
Learning with Few Labeled Data
Pratik Chaudhari (University of Pennsylvania)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
The human visual system is proof that it is possible to learn new categories with extremely few samples; humans do not need a million samples to learn to distinguish a poisonous mushroom from an edible one in the wild. Such ability, arguably, comes from having seen millions of other categories and transferring learnt representations to the new categories. This talk will present a formal connection of machine learning with thermodynamics to characterize the quality of learnt representations for transfer learning. We will discuss how information-theoretic functionals such as rate, distortion and classification loss lie on a convex, so-called, equilibrium surface. We prescribe dynamical processes to traverse this surface under constraints, e.g., an iso-classification process that modulates rate and distortion to keep the classification loss unchanged. We demonstrate how such processes allow complete control over the transfer from a source dataset to a target dataset and can guarantee the performance of the final model. This talk will discuss results from https://arxiv.org/abs/1909.02729 and https://arxiv.org/abs/2002.12406.
- Supplements
-
--
|
03:00 PM - 03:30 PM
|
|
Break
|
- Location
- --
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
03:30 PM - 04:30 PM
|
|
Fusion with Optimal Transport
Justin Solomon (Massachusetts Institute of Technology)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
Many problems in learning and statistics involve fusing potentially conflicting signals into a single coherent observation about a system or environment. Optimal transport provides a valuable language for posing and solving such fusion problems in a mathematically-justified and efficient framework. In this talk, I will summarize efforts in our group on developing efficient algorithms for model fusion drawing from state-of-the-art algorithms for computational transport.
- Supplements
-
--
|
|
May 08, 2020
Friday
|
09:30 AM - 10:30 AM
|
|
Learning with entropy-regularized optimal transport
Aude Genevay (Massachusetts Institute of Technology)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
Entropy-regularized OT (EOT) was first introduced by Cuturi in 2013 as a solution to the computational burden of OT for machine learning problems. In this talk, after studying the properties of EOT, we will introduce a new family of losses between probability measures called Sinkhorn Divergences. Based on EOT, this family of losses actually interpolates between OT (no regularization) and MMD (infinite regularization). We will illustrate these theoretical claims on a set of learning problems formulated as minimizations over the space of measures.
- Supplements
-
--
|
10:30 AM - 11:00 AM
|
|
Break
|
- Location
- --
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
11:00 AM - 12:00 PM
|
|
Analysis of Gradient Descent on Wide Two-Layer ReLU Neural Networks
Lenaic Chizat (Centre National de la Recherche Scientifique (CNRS))
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
In this talk, we propose an analysis of gradient descent on wide two-layer ReLU neural networks that leads to sharp characterizations of the learned predictor and strong generalization performances. The main idea is to study the dynamics when the width of the hidden layer goes to infinity, which is a Wasserstein gradient flow. While this dynamics evolves on a non-convex landscape, we show that its limit is a global minimizer if initialized properly. We also study the "implicit bias" of this algorithm when the objective is the unregularized logistic loss. We finally discuss what these results tell us on the generalization performance. This is based on joint work with Francis Bach.
- Supplements
-
--
|
12:00 PM - 02:00 PM
|
|
Break
|
- Location
- SLMath: Online/Virtual
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
02:00 PM - 03:00 PM
|
|
Mean field theory of neural networks: From stochastic gradient descent to Wasserstein gradient flows
Andrea Montanari (Stanford University)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
Modern neural networks contain millions of parameters, and training them requires to optimize a highly non-convex objective. Despite the apparent complexity of this task, practitioners successfully train such models using simple first order methods such as stochastic gradient descent (SGD). I will survey recent efforts to understand this surprising phenomenon using tools from the theory of partial differential equations. Namely, I will discuss a mean field limit in which the number of neurons becomes large, and the SGD dynamics is approximated by a certain Wasserstein gradient flow. [Joint work with Adel Javanmard, Song Mei, Theodor Misiakiewicz, Marco Mondelli, Phan-Minh Nguyen]
- Supplements
-
--
|
03:00 PM - 03:30 PM
|
|
Break
|
- Location
- --
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
03:30 PM - 04:30 PM
|
|
Nonlocal-interaction equations on graphs and gradient flows in nonlocal Wasserstein metric
Dejan Slepcev (Carnegie Mellon University)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
We consider transport equations on graphs, where mass is distributed over vertices and is transported along the edges. The first part of the talk will deal with the graph analogue of the Wasserstein distance, in the particular case where the notion of density along edges is inspired by the upwind numerical schemes. While this approach has both theoretical and computational advantages, the resulting distance is only a quasi-metric. We investigate this quasi-metric both on graphs and on more general structures where the set of ``vertices'' is an arbitrary positive measure. In the second part of the talk we will interpret the nonlocal-interaction equation equations on graphs as gradient flows with respect to the graph-Wasserstain quasi-metric of the nonlocal-interaction energy. We show that for graphs representing data sampled from a manifold, the solutions of the nonlocal-interaction equations on graphs converge to solutions of an integral equation on the manifold.
- Supplements
-
--
|
|