Font Size: a A A

Sparse Principal Component Regression Of Binary Data

Posted on:2022-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:B M WenFull Text:PDF
GTID:2480306740457104Subject:Statistics
Abstract/Summary:PDF Full Text Request
The binary data is a common kind of data in the research of medicine,biology and other disciplines,and with the development of modern society,the daily contact with a large amount of data.Therefore,it is necessary to analyze a large number of binary data to get the corresponding conclusions to make the prediction and judgment.In statistics,the most obvious method is to verify the multicollinearity of the data set before regression,that is,to conduct the multicollinearity treatment first,and then to perform the prediction according to the idea of regression.At present,for the multicollinearity of binary data,there is the Logistic Principal Component Analysis,which combines the idea of logistic regression and classical principal component analysis.Then,in order to specify which part of the principal component plays a decisive role,the Sparse Logistic Principal Component Analysis(SPCA)was developed.However,the above two principal component analysis methods for binary data only carry out multicollinearity processing on the data,and have not been further considered in the regression.Similarly,the existing regression methods related to principal components are sparse principal component regression and sparse principal component regression applied to the generalized linear model.These two methods are not highly applicable to the situation where both explanatory variables and response variables are binary data.Therefore,this paper combines the sparse logistic principal component analysis for binary data with the existing sparse principal component regression method to study the sparse principal component regression for binary data(both explanatory variables and response variables are binary data).After proving that the sparse principal component regression of binary data has asymptotic properties,the ADMM algorithm is revised.The regression problem studied in this paper and the sparse principal component regression under the existing generalized linear model are used to simulate the coefficients of binary data estimation.Simulation experiments show that the existing methods do not well represent a certain generalized linear relationship between explanatory variables and response variables,and the prediction accuracy of the regression problem studied in this paper obtained by simulation experiments is relatively high.Finally,medical SPECT data(both explanatory variables and response variables are binary data)are used for example analysis.
Keywords/Search Tags:Principal component analysis, Sparsity, Linear regression, Maximum likelihood estimate, Consistence
PDF Full Text Request
Related items