Font Size: a A A

High Dimensional Feature Screening And Discriminant Analysis Via Elliptical Copula Model

Posted on:2021-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:H SunFull Text:PDF
GTID:2480306221494734Subject:Statistics
Abstract/Summary:PDF Full Text Request
In the big data era,non-Gaussian data with high-dimensionality are frequently encountered,which poses great challenges for traditional analysis.The least squares estimator ? is not well defined due to the singularity of Gram moment X'X when the dimension p is greater than the sample size n instead of effectiveness in traditional regression analysis,thus the feature screening of covariates is essential to reduce the dimensionality.We innovatively use elliptical distribution to propose an extremely flexible semi-parametric regression model called Multi-response Trans-Elliptical Regression(MTER for short)Model to extend the application,which can capture the heavy-tail characteristic and tail dependence of both responses and covariates.The MTER model concerns the feature screening for multiple responses and covers a large class of regression models such as multi-response linear model,multi-response additive regression model and so on.Similarly,the classical Fisher linear discriminant analysis is asymptotically equivalent to random guess when the dimensionality of covariates increases and the Gaussian assumption dissatisfies.We also propose an Integrative Copula Discrimination Analysis(ICDA for short)in the context of two-class classification based on multi-source non-Gaussian data.We investigate the feature screening procedure for the MTER model,in which Kendall'tay-based canonical correlation estimators are proposed to characterize the correlation between each transformed predictor and the multivariate transformed responses.The main idea is to substitute the classical canonical correlation ranking index by a carefully constructed nonparametric version,we refer to the proposed procedure as Rank-based Canonical Correlation screening(RCC for short).The sure screening property and ranking consistency property are established for the proposed procedure.Simulation results show that the proposed method is much more powerful to distinguish the informative features from the unimportant ones than some state-of-the-art competitors,especially for heavy tailed distributions and high-dimensional responses.At last,a real data example is given to illustrate the effectiveness of the proposed procedureWe also extend the Gaussian assumption to construct an integrative classifier via regularized optimization problem for multiple dataset,in which Gaussian copula model is adopted for relaxing the normality assumption and group-lasso form penalty is used for fully excavating information from multi-source data.A promising solution of multi-source data is to apply the analysis for each data-source separately,while the ICDA classifier achieves a smaller classification error rate than running copula discriminant analysis on each data-source individually.Thorough numerical studies are conducted to assess the finite sample performance of the new procedure,and the results show that the misclassification error rate is approximately close to the Bayes error rate and the ICDA method is an efficient replacement of some-of-the-art competitors.At last,a real gene-expression data example is given to illustrate the empirical advantages and usefulness of ICDA procedure,in which ICDA has better prediction performance and can identify more critical genes that are essential to discriminant analysis.
Keywords/Search Tags:High-dimensionality, Elliptical Distribution, Kendall'tau, Feature Screening, Discriminant Analysis
PDF Full Text Request
Related items