Subsampling for Big Data Regression with Measurement Constraints

发布时间:2024-08-21 点击次数:

标题:Subsampling for Big Data Regression with Measurement Constraints

报告时间:2024年8月22日(星期四)09:30-10:30

报告地点:人民大街校区数学与统计学院415室

主讲人:Lin Wang

主办单位:数学与统计学院

报告内容简介:

Despite the availability of extensive data sets, it is often impractical to observe the responses or labels for all data points due to various measurement constraints in many applications. To address this challenge, subsampling approaches can be employed to select a subset of design points from a large pool for observation, resulting in substantial savings in labeling costs. In this presentation, I will introduce our recent research on computationally feasible subsampling techniques. Our primary focus is on regression with labeled data, which includes linear regression, ridge regression, and nonparametric additive regression. For these regression tasks, we have developed sampling probabilities that aim to minimize the mean squared error in estimations and predictions. We will demonstrate the effectiveness of our proposed approaches through both theoretical analysis and extensive simulations.

主讲人简介:

Lin Wang is Assistant Professor of Statistics at Purdue University. Prior to joining Purdue in 2022, She was Assistant Professor of Statistics at The George Washington University from 2019 to 2022. She obtained her PhD in Statistics in 2019 from University of California, Los Angeles. Her research interests include sampling and subsampling, experimental design, and causal inference.