From an ODE to accelerated stochastic gradient descent: convergence rate and empirical results
[Moved Online] Hot Topics: Optimal transport and applications to machine learning and statistics May 04, 2020  May 08, 2020
Location: SLMath: Online/Virtual
stochastic gradient descent
convex optimization
accelerated algorithms
Nesterov's method
From An ODE To Accelerated Stochastic Gradient Descent: Convergence Rate And Empirical Results
Su, Boyd and Candés showed that Nesterov's method for accelerated gradient descent can be obtained as the discretization of an Ordinary Differential Equation (ODE). By discretizing a modified ODE we show that : (i) we recover Nesterov's method for convex functions using a constant learning rate, (ii) choosing a decreasing learning rate leads to an accelerated stochastic gradient descent algorithm. The learning rate for the stochastic algorithm is of order k^(3/4), which is novel. We also study a second ODE and obtain a corresponding algorithm for the strongly convex case.
Convergence rates for the algorithms are established using a Lyapunov function approach which unifies the continuous time, deterministic gradient, and stochastic gradient cases. The stochastic algorithms are accelerated (in the convex case, and stochastic approximation setting), in the sense that the rate constant improves over existing results. Existing results have a rate constant which depends on G^2, the bound on the norm of the stochastic gradient. The accelerated algorithm rate constant depends on the smaller constant depends on \sigma^2, the variance of the stochastic gradient.
Numerical results in both the stochastic approximation and the finite sum setting, demonstrate that the accelerated algorithms can converge faster than SGD, or momentum SGD algorithms. Moreover, the algorithms are easily tuned, in the sense that empirical results are robust to changes in the learning rate schedule and constants. This is joint work with Maxime Laborde.
Notes

Download 
From An ODE To Accelerated Stochastic Gradient Descent: Convergence Rate And Empirical Results
H.264 Video  928_28396_8330_From_an_ODE_to_Accelerated_Stochastic_Gradient_Descent_Convergence_Rate_and_Empirical_Results.mp4 
Please report video problems to itsupport@slmath.org.
See more of our Streaming videos on our main VMath Videos page.