Font Size: a A A

Sparse Pryncipal Component Analysis Based On Group Lasso

Posted on:2021-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:K S ChenFull Text:PDF
GTID:2370330602481391Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
When dealing with high-dimensional data,because too many variables will seriously affect the prediction accuracy and interpretability of the model,it is particularly important to accurately select the independent variables that have important influence on the response variables from many variables.The Lasso method proposed by Tibshirani compresses the coefficient by penalty function to realize the variable selection.Because of its good sparsity and operation speed,it becomes one of the basic variable selection methods.After that,many schol-ars proposed many new variable selection methods based on the Lasso method,among which Group Lasso is a generalization of Lasso when considering the Group effect of variables.It is well known that principal component analysis(PCA)achieves dimensional reduction by using principal components with less than the number of variables.However,since each principal component is a linear combination of all original variables,the meaning of principal component is of-ten difficult to explain.Sparse principal component analysis(SPCA)realizes the sparsity of principal component results based on PCA and Lasso's penalty func-tion,which greatly improves the interpretability of the results.Meanwhile,the idea of SPCA also builds a bridge between PCA and variable selection.Consid-ering the relationship between SPCA and Group Lasso and Lasso,we reasonably combined Group Lasso with SPCA and proposed a sparse principal component analysis method based on Group Lasso.The second chapter firsttly introduces the basic definition and properties of several classical variable selection methods.Then,from the perspective of penal-ty function,the similarities and differences between ridge regression,Lasso and Group Lasso on the variables of different dimensions are discussed through theo-retical and chart analysis.Finally,in order to make preparations for the following theoretical analysis,we give the solving algorithm of Group Lasso in general casec by combining Sparse Group Lasso method.The third chapter firstly introduces the theory and basic steps of PCA,and points out the shortcomings of the results of PCA such as non-sparsity and poor interpretation through case analysis.Then,on this basis,the main idea of SPCA is analyzed,that is,PCA is equivalent to the ridge regression problem in variable selection,and then Lasso penalty function is added to achieve sparsity.Finally,the numerical solution of SPCA is discussed,and an example is given to illustrate that the sparse results of SPCA can greatly improve the interpretation of the model.The innovation of this paper is to consider the reasonable combination of Group Lasso and SPCA.We propose a sparse principal component analysis method based on Group Lasso and give the corresponding numerical solution.Through the actual data and a large number of simulation data,it is shown that when variables have group effect,our new method can produce sparse results with group effect,which is a reasonable extension and improvement of SPCA.
Keywords/Search Tags:Variable Selection, Group Effect, Group Lasso, Principal Component Analysis, Sparse Principal Component Analysis
PDF Full Text Request
Related items