Font Size: a A A

Feature Screening For Ultrahigh Dimensional Discriminant Analysis With Mixed Data

Posted on:2022-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q GaoFull Text:PDF
GTID:2568306323969659Subject:Statistics
Abstract/Summary:PDF Full Text Request
The development of science and technology has led to a large number of high dimensional data appearing in various fields,such as medicine,genome association analysis,and finance.The emergence of a large amount of high dimensional data makes many classical statistical methods invalid.In addition,putting a large number of variables into the model will reduce the accuracy of statistical inference and the interpretability of the model.Variable selection is a common way to cope with high dimensional data,which performs variable selection and parameter estimation at the same time by optimizing a specific objective function.However,in the face of the exponential growth of variable dimensions with sample size,many variable selection methods are no longer effective.Therefore,some scholars proposed feature screening methods,which reduce the dimensionality of the ultrahigh dimensional data to an appropriate size,then variable selection methods are applied to the reduceddimensional data.By doing that,the accuracy of inference and the interpretability of the model could be improved.The main research object of this paper is the feature screening of ultrahigh dimensional discriminant analysis with mixed data.Mixed data refers to data that has both categorical and numerical variables.Mixed data is very common in application.However,there is little existing literature about ultrahigh dimensional mixed data.Even if there are some methods that can handle such data,it is done by discretizing variables,which reduces the accuracy of its inference to a certain extent.Therefore,further research is needed to solve this problem.This paper makes the following contributions:In this paper,we propose a feature screening method based on p-value for ultrahigh dimensional mixed data,PV-SIS,and we prove that it has sure screening property under certain conditions.We explore the performance of the proposed method with finite samples through numerical studies.The experimental results show that the proposed method is effective on the mixed data to screen out active variables.We also demonstrate the effectiveness of PV-SIS in applications through an empirical analysis.
Keywords/Search Tags:Feature Screening, Ultrahigh Dimensional Data, Sure Independence Screening, P-Value
PDF Full Text Request
Related items