Mar 07, 2022
Monday
|
07:55 AM - 08:00 AM
|
|
Welcome
|
- Location
- --
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
08:00 AM - 08:25 AM
|
|
Datamodels: Predicting Predictions with Training Data
Aleksander Madry (Massachusetts Institute of Technology)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
Machine learning models tend to rely on an abundance of training data. Yet, understanding the underlying structure of this data---and models' exact dependence on it---remains a challenge.
In this talk, we will present a new framework---called datamodeling---for directly modeling predictions as functions of training data. This datamodeling framework, given a dataset and a learning algorithm, pinpoints---at varying levels of granularity---the relationships between train and test point pairs through the lens of the corresponding model class. Even in its most basic version, datamodels enable many applications, including discovering subpopulations, quantifying model brittleness via counterfactuals, and identifying train-test leakage.
- Supplements
-
|
08:30 AM - 08:55 AM
|
|
Domain Adaptation Under Structural Causal Models
Yuansi Chen (Duke University )
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
Domain adaptation (DA) arises as an important problem in statistical machine learning when the source data used to train a model is different from the target data used to test the model. Recent advances in DA have mainly been application-driven and have largely relied on the idea of a common subspace for source and target data. To understand the empirical successes and failures of DA methods, we propose a theoretical framework via structural causal models that enables analysis and comparison of the prediction performance of DA methods. This framework also allows us to itemize the assumptions needed for the DA methods to have a low target error. Additionally, with insights from our theory, we propose a new DA method called CIRM that outperforms existing DA methods when both the covariates and label distributions are perturbed in the target data. We complement the theoretical analysis with extensive simulations to show the necessity of the devised assumptions. Reproducible synthetic and real data experiments are also provided to illustrate the strengths and weaknesses of DA methods when parts of the assumptions in our theory are violated.
- Supplements
-
|
09:00 AM - 09:25 AM
|
|
Assessing Replicability Via Multi-lab Collaborations
Blake McShane (Northwestern University)
|
- Location
- SLMath: Online/Virtual
- Video
-
--
- Abstract
Multi-lab reproducibility collaborations such as the Reproducibility Project: Psychology (2015), the Experimental Economics Replication Project (2016), and the Social Sciences Replication Project (2018) have raised concern about the the reproducibility of research findings in these fields. However, the definitions of replication used in these efforts are arguably a bit impoverished. Further, more subtle, nuanced, and circumspect assessments are possible in many multi-lab reproducibility collaborations.
In this talk, we provide such an assessment for a multi-lab reproducibility collaboration that examined a number of prominent effects from behavioral economics and social psychology. Our results show that the various outcomes examined show a perhaps surprising degree of variability across the various labs and participants that took part in the project. This portends a much lower degree of replicability than may have previously been thought. Our results also show high correlation across the various outcomes that brings to the fore a potential moderator. When this moderator is incorporated into the analysis, it is clear that it is strongly predictive of many of these outcomes and does indeed moderate effects. This suggests that the more subtle, nuanced, and circumspect assessments of replicability that we advocate also have the potential to spark theoretical developments.
- Supplements
-
--
|
09:30 AM - 10:00 AM
|
|
Lunch / Dinner Break
|
- Location
- --
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
10:00 AM - 10:25 AM
|
|
Elements of External Validity: Framework, Design, and Analysis
Erin Hartman (University of California, Berkeley)
|
- Location
- SLMath: Online/Virtual
- Video
-
--
- Abstract
External validity of causal findings is a focus of long-standing debates in the social sciences. While the issue has been extensively studied at the conceptual level, in practice, few empirical studies have explicit analysis aimed towards externally valid inferences. In this article, we make three contributions to improve empirical approaches for external validity. First, we propose a formal framework that encompasses four dimensions of external validity; X-, T -, Y -, and C-validity (populations, treatments, outcomes, and contexts). The proposed framework synthesizes diverse external validity concerns. We then distinguish two goals of generalization. To conduct effect-generalization — generalizing the magnitude of causal effects — we introduce three estimators of the target population causal effects. For sign-generalization — generalizing the direction of causal effects — we propose a novel multiple-testing procedure under weaker assumptions. We illustrate our methods through field, survey, and lab experiments as well as observational studies.
- Supplements
-
--
|
10:30 AM - 10:55 AM
|
|
Evaluating Replicability: Considerations for Analyses and Implications for Design
Jacob Schauer (Northwestern University)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
As high-profile empirical research has questioned the replicability of scientific findings, it has become clear that there is no standard approach to designing and analyzing studies to evaluate replication. Ambiguity regarding key estimands for “replication” and the purpose of replication research has shaped statistical treatments of the topic and sparked debate in several fields. This talk sheds light on this ambiguity by identifying different possible statistical definitions of “replication” that could be studied. It then highlights relevant analysis methods and derives their statistical properties. Finally, it connects these properties to key implications for the design of primary studies, as well as subsequent replication attempts.
- Supplements
-
--
|
11:00 AM - 11:15 AM
|
|
Break
|
- Location
- --
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
11:15 AM - 12:15 PM
|
|
Discussion
|
- Location
- SLMath: Online/Virtual
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
|
Mar 08, 2022
Tuesday
|
08:00 AM - 08:25 AM
|
|
Disentangling Confounding and Nonsense Associations Due to Dependence
Betsy Ogburn (Johns Hopkins University)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
Nonsense associations can arise when an exposure and an outcome of interest exhibit similar patterns of dependence. Confounding is present when potential outcomes are not independent of treatment. This talk will describe how confusion about these two phenomena results in shortcomings in popular methods in three areas: causal inference with multiple treatments and unmeasured confounding, causal and statistical inference with social network data, and spatial confounding. For each of these areas I will demonstrate the flaws in existing methods and describe new methods that were inspired by careful consideration of dependence and confounding.
- Supplements
-
--
|
08:30 AM - 08:55 AM
|
|
Interpretable Sensitivity Analysis for the Baron–Kenny Approach to Mediation with Unmeasured Confounding
Peng Ding (University of California, Berkeley)
|
- Location
- SLMath: Online/Virtual
- Video
-
--
- Abstract
Mediation analysis assesses the extent to which the treatment affects the outcome through a mediator and the extent to which it operates through other pathways. As one of the most cited methods in empirical mediation analysis, the classic Baron–Kenny approach allows us to estimate the indirect and direct effects of the treatment on the outcome in linear structural equation models. However, when the treatment and the mediator are not randomized, the estimates of the direct and indirect effects from the Baron–Kenny approach may be biased due to unmeasured confounding among the treatment, mediator, and outcome. Building on Cinelli & Hazlett (2020), we provide a sharp and interpretable sensitivity analysis method for the Baron–Kenny approach to mediation in the presence of unmeasured confounding. We first generalize Cinelli & Hazlett (2020)’s sensitivity analysis method for linear regression to allow for heteroskedasticity and model misspecification. We then apply the general result to develop a sensitivity analysis method for the Baron–Kenny approach. Importantly, we express the sensitivity parameters in terms of the partial R^2s that correspond to the natural factorization of the joint distribution of the direct acyclic graph. Thus, they are interpretable as the proportions of variability explained by unmeasured confounding given the observed covariates. Moreover, we extend the method to deal with multiple mediators, based on a novel matrix version of the partial R^2. We prove that all our sensitivity bounds are attainable and thus sharp.
- Supplements
-
--
|
09:00 AM - 09:25 AM
|
|
Distribution Generalization in Underidentified Causal Models
Jonas Peters (University of Copenhagen)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
We consider the problem of predicting a response Y from a set of covariates X when test and training distributions differ. We consider a setting where such differences have causal explanations and the test distributions emerge from interventions. Causal models minimize the worst-case risk under arbitrary interventions on the covariates but may not always be identifiable from observational or interventional data. In this talk, we argue that underidentification and distribution generalization are closely connected. We propose to consider most predictive invariant models and discuss some of their properties. We also present limits of distribution generalization.
- Supplements
-
|
09:30 AM - 10:00 AM
|
|
Lunch / Dinner Break
|
- Location
- --
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
10:00 AM - 10:25 AM
|
|
An Automatic Finite-Sample Robustness Metric: Can Dropping a Little Data Change Conclusions?
Tamara Broderick (Massachusetts Institute of Technology)
|
- Location
- SLMath: Online/Virtual
- Video
-
--
- Abstract
One hopes that data analyses will be used to make beneficial decisions regarding people's health, finances, and well-being. But the data fed to an analysis may systematically differ from the data where these decisions are ultimately applied. For instance, suppose we analyze data in one country and conclude that microcredit is effective at alleviating poverty; based on this analysis, we decide to distribute microcredit in other locations and in future years. We might then ask: can we trust our conclusion to apply under new conditions? If we found that a very small percentage of the original data was instrumental in determining the original conclusion, we might expect the conclusion to be unstable under new conditions. So we propose a method to assess the sensitivity of data analyses to the removal of a very small fraction of the data set. Analyzing all possible data subsets of a certain size is computationally prohibitive, so we provide an approximation. We call our resulting method the Approximate Maximum Influence Perturbation. Our approximation is automatically computable, theoretically supported, and works for common estimators --- including (but not limited to) OLS, IV, GMM, MLE, MAP, and variational Bayes. We show that any non-robustness our metric finds is conclusive. Empirics demonstrate that while some applications are robust, in others the sign of a treatment effect can be changed by dropping less than 0.1% of the data --- even in simple models and even when standard errors are small.
- Supplements
-
--
|
10:30 AM - 10:55 AM
|
|
Near-Optimal Compression in Near-Linear Time
Raaz Dwivedi (Harvard University)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
We introduce Kernel Thinning-Compress++ (KT-Compress++), an algorithm based on two new procedures for compressing a distribution P nearly optimally and more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel k and O(n log^3 n) time, KT-Compress++ compresses an n-point approximation to P into a sqrt(n)-point approximation with better than Monte Carlo integration error rates for functions in the associated reproducing kernel Hilbert space (RKHS). First we show that for any fixed function KT-Compress++ provides dimension-free guarantees for any kernel, any distribution, and any fixed function in the RKHS. Second, we show that with high probability, the maximum discrepancy in integration error is O_d(n^{-1/2} sqrt(log n)) for compactly supported P and O_d(n^{-1/2} (log n)^{(d+1)/2} sqrt(loglog n)) for sub-exponential P on R^d. In contrast, an equal-sized i.i.d. sample from P suffers at least n^{-1/4} integration error. Our sub-exponential guarantees nearly match the known lower bounds for several settings, and while resembling the classical quasi-Monte Carlo error rates for uniform P on [0,1]^d they apply to general distributions on R^d and a wide range of commonly used universal kernels. En route, we introduce a new simple meta-procedure Compress++, that can speed up any thinning algorithm while suffering at most a factor of 4 n error. In particular, Compress++ enjoys a near-linear runtime given any quadratic-time input and reduces the runtime of super-quadratic algorithms by a square-root factor. We also use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for a range of kernels including Gaussian, Matern, Laplace, and B-spline kernels. Finally, we present several vignettes illustrating the practical benefits of KT-Compress++ over i.i.d. sampling and standard Markov chain Monte Carlo thinning with challenging differential equation posteriors in dimensions d = 2 to 100.
- Supplements
-
|
11:00 AM - 11:15 AM
|
|
Break
|
- Location
- --
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
11:15 AM - 12:15 PM
|
|
Discussion
|
- Location
- SLMath: Online/Virtual
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
|
Mar 09, 2022
Wednesday
|
08:00 AM - 08:25 AM
|
|
A Precise High-Dimensional Asymptotic Theory for AdaBoost
Pragya Sur (Harvard University)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
Ensemble learning algorithms represent a cornerstone of traditional statistical learning. Recent works on cross-study replicability (e.g. Patil and Parmigiani PNAS '18) demonstrate that ensembling single-study learners can significantly improve out-of-study generalization capabilities of learning algorithms. Despite these advances, sharp characterization of the generalization error of ensembling algorithms is often challenging, even in the single study setting. Motivated by these considerations, we conduct an in-depth study of the generalization performance of boosting algorithms, specifically AdaBoost, a canonical ensemble learning algorithm. For this talk, we will focus on the problem of classification in the context of a single observed training data that is both high-dimensional and (asymptotically) linearly separable. We utilize the classical connection of AdaBoost with min-$\ell_1$-norm interpolation (Zhang and Yu AOS '05), and under specific data-generating models, establish an asymptotically exact characterization of the generalization performance. This, in turn, improves upon existing upper bounds in our setting. As a byproduct, our result formalizes the following fact in the context of AdaBoost : overparametrization helps optimization. Our analysis is relatively general and has potential applications for other ensembling approaches. Time permitting, I will discuss some of these extensions. This is based on joint work with Tengyuan Liang.
- Supplements
-
--
|
08:30 AM - 08:55 AM
|
|
Prospects and Perils of Interpolating Models
Fanny Yang
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
In this talk, I will discuss several recent works from our group studying interpolating high-dimensional linear models. On the bright side, we show that for sparse ground truths, minimum-norm interpolators (including max-margin classifiers) can achieve high-dimensional asymptotic consistency and fast rates for isotropic Gaussian covariates. However, we also prove some caveats of such interpolating solutions in the context of robustness that are also observed for neural network learning: when performing adversarial training, interpolation can hurt robust test accuracy as compared to regularized solutions. Further, in the low-sample regime, the adversarially robust max-margin solution surprisingly can achieve lower robust accuracy than the standard max-margin classifier.
- Supplements
-
|
09:00 AM - 09:25 AM
|
|
Distributionally Robust Bayesian Nonparametric Regression
Jose Blanchet (Stanford University)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
A distributionally robust Bayesian nonparametric regression estimator is the solution of a min-max game in which the statistician chooses a regression function of observations (i.e. an element in L2) and the adversary, knowing the statistician's selection, maximizes the mean-squared error incurred over a Wasserstein-type-2 ball around a full nonparametric Bayesian model, which we assume to be Gaussian on a suitable Hilbert space. We study this doubly infinite-dimensional game, show the existence of a Nash equilibrium and its evaluation.
- Supplements
-
--
|
09:30 AM - 10:00 AM
|
|
Lunch / Dinner Break
|
- Location
- --
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
10:00 AM - 10:25 AM
|
|
Calibrated Inference: Statistical Inference that Accounts for Both Sampling Uncertainty and Distributional Uncertainty
Dominik Rothenhaeusler (Stanford University)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
During data analysis, analysts often have to make seemingly arbitrary decisions. For example during data pre-processing, there are a variety of options for dealing with outliers or inferring missing data. Similarly, many specifications and methods can be reasonable to address a certain domain question. This may be seen as a hindrance to reliable inference since conclusions can change depending on the analyst's choices. In this paper, we argue that this situation is an opportunity to construct confidence intervals that account not only for sampling uncertainty but also some type of distributional uncertainty. Distributional uncertainty is closely related to other issues in data analysis, ranging from dependence between observations to selection bias and confounding. We demonstrate the utility of the approach on simulated and real-world data. This is joint work with Yujin Jeong.
- Supplements
-
|
10:30 AM - 10:55 AM
|
|
Assessing External Validity Over Worst-Case Subpopulations
Hongseok Namkoong (Columbia University)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
Study populations are typically sampled from limited points in space and time, and marginalized groups are underrepresented. To assess the external validity of randomized and observational studies, we propose and evaluate the worst-case treatment effect (WTE) across all subpopulations of a given size, which guarantees positive findings remain valid over subpopulations. We develop a semiparametrically efficient estimator for the WTE that analyzes the external validity of the augmented inverse propensity weighted estimator for the average treatment effect. Our cross-fitting procedure leverages flexible nonparametric and machine learning-based estimates of nuisance parameters and is a regular root-n estimator even when nuisance estimates converge more slowly. On real examples where external validity is of core concern, our proposed framework guards against brittle findings that are invalidated by unanticipated population shifts.
- Supplements
-
|
11:00 AM - 11:15 AM
|
|
Break
|
- Location
- --
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
11:15 AM - 12:15 PM
|
|
Discussion
|
- Location
- SLMath: Online/Virtual
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
|
Mar 10, 2022
Thursday
|
08:00 AM - 08:25 AM
|
|
Veridical Network Embedding
Tian Zheng (Columbia University)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
Embedding nodes of a large network into a metric (e.g., Euclidean) space has become an area of active research in statistical machine learning, which has found applications in natural and social sciences. Generally, a representation of a network object is learned in Euclidean geometry and is then used for subsequent tasks regarding the nodes and/or edges of the network, such as community detection, node classification, and link prediction. Network embedding algorithms have been proposed in multiple disciplines, often with domain-specific notations and details. In addition, different measures and tools have been adopted to evaluate and compare the methods proposed under different settings, often dependent of the downstream tasks. As a result, it is challenging to study these algorithms in the literature systematically. Motivated by the recently proposed Veridical Data Science (VDS) framework, we propose a framework for network embedding algorithms and discuss how the principles of predictability, computability, and stability apply in this context. The utilization of this framework in network embedding holds the potential to motivate and point to new directions for future research.
- Supplements
-
|
08:30 AM - 08:55 AM
|
|
Bayesian Nonparametric Models for Treatment Effect Heterogeneity: Model Parameterization, Prior Choice, and Posterior Summarization
Jared Murray (University of Texas, Austin)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
Bayesian nonparametric models are a popular and effective tool for inferring the heterogeneous effects of interventions. I will discuss how to carefully specify models and prior distributions to apply judicious regularization of heterogeneous effects. I will also discuss how to extract answers to scientific and policy questions from a fitted nonparametric model using posterior summarization to avoid problems incurred by using competing or incompatible model specifications for targeting different estimands. Together these tools provide a general recipe for obtaining stable, generalizable and transferrable insights about heterogeneous effects.
- Supplements
-
|
09:00 AM - 09:25 AM
|
|
Sim2Real Transfer in Robotics: Thoughts on Model Pruning and Robust Visual Transfer
Bradly Stadie (Toyota technological Institute at Chicago)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
We consider the problem of transferring robotic control from simulation to the real world. In particular, we consider two important sub-problems that are often faced: model size and visual robustness.
To decrease the size of our trained models, we develop a one shot pruning technique for recurrent and time series models that significantly reduces our model footprint while maintaining accuracy. This smaller model size is crucial for overcoming hardware limitations in robotics. For our vision system, we develop a new a statistical process, Invariance Through Inference, for adapting visual systems from simulation into the real world. This process shows how we can use statistical inference at test time to extract robust visual features that are constant across simulated and real world models.
- Supplements
-
--
|
09:30 AM - 10:00 AM
|
|
Lunch / Dinner Break
|
- Location
- --
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
10:00 AM - 10:25 AM
|
|
Predicting Out-of-Distribution Error with the Projection Norm
Jacob Steinhardt (UC Berkeley)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
We will consider a metric---the "Projection Norm"---that predicts a model's performance on out-of-distribution (OOD) data, without access to ground truth labels. Projection Norm first uses model predictions to pseudo-label test samples and then trains a new model on the pseudo-labels. The more the new model's parameters differ from an in-distribution model, the greater the predicted OOD error. Empirically, this outperforms existing methods on both image and text classification tasks and across different network architectures. Theoretically, we connect our approach to a bound on the test error for overparameterized linear models. Furthermore, we find that Projection Norm is the only approach that achieves non-trivial detection performance on adversarial examples.
Joint work with Yaodong Yu, Zitong Yang, Alex Wei, and Yi Ma. https://arxiv.org/abs/2202.05834
- Supplements
-
--
|
10:30 AM - 10:55 AM
|
|
Structured Adaptation & Deep Learning: When Prediction Yields Adaptation
Zachary Lipton (Carnegie Mellon University)
|
- Location
- SLMath: Online/Virtual
- Video
-
- Abstract
- --
- Supplements
-
--
|
11:00 AM - 11:15 AM
|
|
Break
|
- Location
- --
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
11:15 AM - 12:15 PM
|
|
Discussion
|
- Location
- SLMath: Online/Virtual
- Video
-
--
- Abstract
- --
- Supplements
-
--
|
|