Near-Optimal Compression in Near-Linear Time

[Virtual] Hot Topics: Foundations of Stable, Generalizable and Transferable Statistical Learning March 07, 2022 - March 10, 2022

March 08, 2022 (10:30 AM PST - 10:55 AM PST)

Speaker(s): Raaz Dwivedi (Harvard University)
Location: SLMath: Online/Virtual

Tags/Keywords

compression
MCMC
kernel
coresets
maximum mean discrepancy

Primary Mathematics Subject Classification No Primary AMS MSC

Secondary Mathematics Subject Classification No Secondary AMS MSC

Video

Near-Optimal Compression In Near-Linear Time

Abstract

We introduce Kernel Thinning-Compress++ (KT-Compress++), an algorithm based on two new procedures for compressing a distribution P nearly optimally and more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel k and O(n log^3 n) time, KT-Compress++ compresses an n-point approximation to P into a sqrt(n)-point approximation with better than Monte Carlo integration error rates for functions in the associated reproducing kernel Hilbert space (RKHS). First we show that for any fixed function KT-Compress++ provides dimension-free guarantees for any kernel, any distribution, and any fixed function in the RKHS. Second, we show that with high probability, the maximum discrepancy in integration error is O_d(n^{-1/2} sqrt(log n)) for compactly supported P and O_d(n^{-1/2} (log n)^{(d+1)/2} sqrt(loglog n)) for sub-exponential P on R^d. In contrast, an equal-sized i.i.d. sample from P suffers at least n^{-1/4} integration error. Our sub-exponential guarantees nearly match the known lower bounds for several settings, and while resembling the classical quasi-Monte Carlo error rates for uniform P on [0,1]^d they apply to general distributions on R^d and a wide range of commonly used universal kernels. En route, we introduce a new simple meta-procedure Compress++, that can speed up any thinning algorithm while suffering at most a factor of 4 n error. In particular, Compress++ enjoys a near-linear runtime given any quadratic-time input and reduces the runtime of super-quadratic algorithms by a square-root factor. We also use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for a range of kernels including Gaussian, Matern, Laplace, and B-spline kernels. Finally, we present several vignettes illustrating the practical benefits of KT-Compress++ over i.i.d. sampling and standard Markov chain Monte Carlo thinning with challenging differential equation posteriors in dimensions d = 2 to 100.

Supplements

	Near-Optimal Compression in Near-Linear Time 19.8 MB application/pdf	Download

Video/Audio Files

Near-Optimal Compression In Near-Linear Time

Troubles with video?

Please report video problems to itsupport@slmath.org.

See more of our Streaming videos on our main VMath Videos page.