Font Size: a A A

Research On Dimension Reduction Methods Of Several Important Data Types

Posted on:2022-07-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Y ChengFull Text:PDF
GTID:1480306323980319Subject:Statistics
Abstract/Summary:PDF Full Text Request
In recent years,with the development of data science and information technology,the volume and dimension of data that can be collected and analyzed in various fields are becoming larger and larger.Massive data contains a lot of information,which makes the accuracy of model prediction higher,but it also brings new challenges to statistical analysis and calculation.Although big data contains the information we are interested in,it also has a lot of redundant information,which not only affects the accuracy of the model,but also increases the computational complexity.Therefore,the key to deal with this problem is to develop the corresponding dimension reduction method.There are two important research directions in dimension reduction,one is sufficient dimension reduction,the other is variable selection.The former seeks a series of variable com-binations while minimizing information loss,while the latter aims to screen important variables.Nowadays,there are many researches on these two methods,such as the"inverse regression" sufficient dimension reduction method and the variable selection method based on lasso.These methods are mainly for normal batch data and linear models.For some data with other characteristics,the research is relatively less,such as online data,heteroscedasticity data and semi-parametric model,etc.This paper tries to study these data and models with various characteristics,and propose the corresponding methods of sufficient dimension reduction and variable selection.In Chapter 2,we propose an online sparse sliced inverse regression(OSSIR)method for online sufficient dimension reduction.The existing online sufficient di-mension reduction methods focus on the case when the dimension p is small.In this article we show that our method can achieve a better statistical accuracy and computa-tion speed when the dimension p is large.There are two important steps in our method,one is to extend the online principal component analysis to iteratively obtain the eigen-values and eigenvectors of the kernel matrix,the other is to use the truncated gradient to achieve online L1 regularization.We also analyse the convergence of the extended Can-did covariance-free incremental PCA(CCIPCA)and our method.By comparing with several existing methods in the simulations and real data applications,we demonstrate the effectiveness and efficiency of our method.In Chapter 3,we propose a new sufficient dimension reduction method for the high dimensional data with heteroscedasticity.From the candidate matrix derived from the principal quantile regression method and using some equivalences,we construct new ar-tificial responses which is made up from the eigenvectors of the candidate matrix.Then we apply a Lasso regression to obtain sparse dimension reduction directions.While for the "large p small n" case that p>>n,we use principal projection to solve the dimen-sion reduction problem in a lower dimensional subspace.Theoretical properti es of the methodology are established.Compared with several existing methods in the simula-tions and real data analysis,we demonstrate the advantages of our method in the high dimension data with heteroscedasticity.In Chapter 4,since the quantile regression and distance covariance has been shown to be useful for the heteroscedastic data and sufficient dimension reduction problem re-spectively,we propose a new approach to sufficient dimension reduction by using a quantile version of the distance covariance.Our method do not rely on the linear condi-tion and is robust to the heteroscedasticity.Under some mild conditions,the consistency of our estimator of the central sufficient dimension reduction subspace is established.To demonstrate the effective of our method,we compare several existing sufficient di-mension reduction methods with our method in the simulations and real data studies.In Chapter 5,we apply generalized semiparametric theory to nonlinear sufficient dimension reduction problems.We can obtain a new approach to nonlinear sufficient dimension reduction which do not rely on linear condition and variance constant condi-tion.We discuss the consistency of the estimators of the nonlinear sufficient dimension reduction method.We take the kernel slice inverse regression method as an example and present the estimation equation and the solution process.In addition,we demon-strate the effectiveness of our method when the linear condition and variance constant condition are not satisfied by simulation and empirical analysis.In Chapter 6,we study the high-dimensional partially linear models relevant for the problem of simultaneous variable selection and estimation,under the assumptions that the non-parametric part resides in a reproducing kernel Hilbert space(RKHS)and that the vector of regression coefficients for the parametric component is sparse.A double penalty is used to deal with the problem,with the roughness penalty of squared semi-norm on RKHS deployed to estimate the nonparametric component and a penalty with oracle properties used to achieve sparsity in the parametric part.Under some regular conditions,we establish the rate of convergence and consistency of the parametric esti-mation together with the consistency of variable selection.Furthermore,the presented estimators of the non-zero coefficients are shown to have the asymptotic oracle prop-erty.Simulations and empirical studies are conducted to demonstrate the performance of the proposed method.
Keywords/Search Tags:Dimension Reduction, Sufficient Dimension Reduction, Variable Selection, Online Learning, Heteroscedasticity, Theory of Reproducing Kernel, Semi-Parametric Theory, Partial Linear Model
PDF Full Text Request
Related items