Font Size: a A A

Wavelet-based LASSO in functional linear regression

Posted on:2011-09-11Degree:Ph.DType:Thesis
University:Columbia UniversityCandidate:Zhao, YihongFull Text:PDF
GTID:2440390002468939Subject:Statistics
Abstract/Summary:
Linear regression model with a functional predictor X(t) and a scalar response Y is a topic of growing interest in statistics. Work in this area has concentrated on estimating a coefficient function o(t) linking the response Y and the predictor X( t) through f o(t) X(t) dt. We propose to estimate o( t) via wavelet-based LASSO approach. Specifically, we perform the discrete wavelet transform (DWT) on the predictors and thus convert the functional regression problem into a high dimensional variable selection problem. LASSO is then employed for variable selection and estimation in the wavelet domain The coefficient function estimate in the original domain may then be obtained by applying the inverse DWT. We discuss various possibilities for determining the tuning parameter. A simulation study and real data analysis demonstrate that our proposed method shows better predictive ability and improved estimation accuracy compared to existing wavelet methods. This method is computationally efficient and can be easily implemented. As is typically the case in functional linear regression, the dimension of the functional predictor far exceeds the number of samples. When the dimensionality of the data is high, an initial dimension reduction prior to application of the well-developed statistical modeling techniques is often beneficial in practice. We propose a two-step estimating procedure. Specifically, we reduce the dimensionality by applying a pre-screening step followed by a well-developed variable selection step (e.g., penalized least squares). We describe screening strategies based on correlation learning, variance learning, and covariance learning. In this thesis, the practical utility of wavelet-based LASSO method is illustrated in an analysis of a real data from the NHLBI-funded Coronary Artery Disease in Young Adults (CARDIA) study. The main purpose of the study is to investigate the role of autonomic nervous system (ANS) in the relationship between social economics status (SES) and the inflammatory biomarkers. RR interval variability (RRV) was used to measure ANS activity. In the literature, RRV was treated as a scalar variable. We believe this extreme data reduction over-simplify the complex structure of the data and may result in lost information. Therefore, we propose to treat RRV as a functional predictor in the analysis, with the hope to elucidate the complex relationship among ANS, SES and the inflammatory biomarkers. I will discuss the ongoing work and future research direction at the end.;KEY WORDS: variable selection, functional data analysis, penalized linear regression, independent screening strategy...
Keywords/Search Tags:Functional, Regression, Wavelet-based LASSO, Linear, Variable selection, Data
Related items