Home /  Workshop /  Schedules /  The broken sample problem revisited: Proof of a conjecture by Bai and Hsing and extensions

The broken sample problem revisited: Proof of a conjecture by Bai and Hsing and extensions

Detection, Estimation, and Reconstruction in Networks April 21, 2025 - April 25, 2025

April 21, 2025 (09:30 AM PDT - 10:30 AM PDT)
Speaker(s): Yihong Wu (Yale University)
Location: SLMath: Eisenbud Auditorium, Online/Virtual
Primary Mathematics Subject Classification No Primary AMS MSC
Secondary Mathematics Subject Classification No Secondary AMS MSC
Video
No Video Uploaded
Abstract

Zoom Link

We revisit the classical broken sample problem: Two samples of iid data points X={X_1,...,X_n} and Y={Y_1,...,Y_m} are observed without correspondence, with m \leq n. The goal is to distinguish the following two hypotheses: Under the null hypothesis, X and Y are independent. Under the alternative hypothesis, X and Y are correlated in the sense that (X_{\pi(i)}, Y_i)'s are drawn independently from a bivariate distribution for some latent injection \pi: [m] \to [n]. Originally introduced by DeGroot, Feder, and Goel [AoMS 1971] to model matching records in census data, this problem has recently gained renewed interest due to its applications in data de-anonymization, data integration, and target tracking. Despite extensive research over the past decades, determining the precise detection threshold has remained an open problem even for equal sample sizes (m=n). Assuming m and n grow proportionally, in this work, we show that the sharp threshold is given by a spectral and an L_2 condition of the likelihood ratio operator, resolving a conjecture of Bai and Hsing [PTRF 2005] in the positive. Extensions to high dimensions and linear models are also discussed. This is based on joint work with Shuyang Gong (Peking University), Simiao Jiao (Duke), and Jiaming Xu (Duke).

Supplements No Notes/Supplements Uploaded
Video/Audio Files
No Video Files Uploaded