Font Size: a A A

Statistical Analysis Of High-dimensional Data Based On Feature Selection

Posted on:2019-05-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:J X SuFull Text:PDF
GTID:1318330566464492Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Dimensional reduction is an important issue in statistical learning and data mining,it has been found an increasingly wide utilization in all fields.It is more important when some of influence of the high-dimensional covariates(also Noise and redundancy)upon data can be ignored.This dissertation devotes the following topics:(1)Outlier detection based on feature selection on Classification-based data.(2)Variable selection for unsupervised clustering learning on numerical or categor-ical data.(3)The feature selection of Progressive Type-? censoring data with high-dimensional covariates,using LASSO-type partial Likelihood function of Cox proportional hazards regression model.The exploration of the data becomes difficult when some outlier data are en-countered,we propose a new method,which is the combination of the multiple correlation coefficients,the dimension-reduction based on feature selection and t-testing,as well as Normalized-Mutual-information based feature selection,to find the outlier points and the corresponding coordinate axis,the numerical simulations demonstrate the performance.Concerning the unsupervised clustering analysis,Rodriguez and Laio(2014)proposed a fast-density-peak-search algorithm to improve the accuracy and reduce the complexity since the algorithm needn't any iteration in its implementation.The merits of the algorithm are extended in clustering of data stream,and also by com-bination with Sparse-PCA to reduce the dimension,it is used in high-dimensional simulated datasets and real Olivetti Database of Faces data,to evaluate the perfor-mance on high-dimensional data space.When the high-dimensional covariates are included and some of the influence can be ignored,the simplification of the model based on informative feature selection is need,the dissertation also devotes to LASSO-Type-partial-Likelihood-based fea-ture selection method in Progressively Type-? censoring data with high-dimensional covariates.The performances are illustrated upon databases of veterans adminis-tration lung cancer andprimary biliary cirrhosis(PBC).
Keywords/Search Tags:High-dimensional data, Sparse principle component analysis, Feature selection, Outlier detection, Clustering analysis, Survival analysis
PDF Full Text Request
Related items