Feature Selection And Application Of Ultra High Dimensional Data With Spherical Variables Based On CC-SIS

Posted on:2024-03-21

Degree:Master

Type:Thesis

Country:China

Candidate:M J Wang

Full Text:PDF

GTID:2530307079461534

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

With the advent of the big data era,high-dimensional data has become increasingly prevalent in various domains,including medicine,biology,and economics.In these domains,high-dimensional datasets often contain spherical variables that encompass crucial information,such as disease onset time,wind direction,and temporal data.Therefore,solving the problem of fast,effective,and stable feature screening for ultra high dimensional data containing spherical variables has important practical application significance.This study centers on the issue of feature selection in high-dimensional data with spherical variables and examines the efficacy of the Conditional Correlation Sure Independence Screening(CC-SIS)method employing random forest kernels.The research consists of two parts:(1)Leveraging the random forest algorithm as an adaptable kernel function,the random forest kernel CC-SIS method is proposed and integrated into the CC-SIS framework.The convergence and selection accuracy of the random forest kernel CC-SIS method are subsequently investigated through numerical experiments.(2)The method is further extended to real data,where the selected data is classified using a model,and the featire screening efficacy is evaluated using five-fold cross-validation.Finally,numerical experiments are conducted to examine the feature selection performance of the random forest kernel CC-SIS method on real-world datasets,as well as the performance and effectiveness of a fusion model incorporating the variable coefficient model,logistic regression model(LR),and convolutional neural network model(CNN)for classification.By comparing several CC-SIS methods based on different kernel functions and the classical SIS method through simulations and experiments,the feature selection performance is assessed.The results demonstrate that for high-dimensional data with spherical variables,the random forest kernel CC-SIS method achieves the highest feature selection accuracy,followed by the v MF kernel and Gaussian kernel CC-SIS methods,which exhibit slightly lower accuracy compared to the random forest kernel CC-SIS method.Lastly,the EP kernel CC-SIS method and the SIS method demonstrate relatively moderate feature selection accuracy.In the context of feature selection in real-world high-dimensional data,the classification accuracy of the selected data plays a pivotal role in evaluating the efficacy of the feature selection process.In order to ensure robust classification accuracy on real data,it is essential to consider not only appropriate feature selection methods but also the selection of an appropriate classification model.While logistic regression models and neural network models exhibit high classification accuracy when applied to datasets with favorable feature selection outcomes,their performance diminishes when dealing with datasets characterized by comparatively moderate feature selection outcomes.Conversely,the variable coefficient model consistently demonstrates stability and attains precise classification results when applied to datasets obtained through various CC-SIS methods employing different kernel functions,as well as the SIS method.By integrating these three models,the fusion model effectively enhances the classification accuracy of the variable coefficient model while preserving its stability,thus providing a substantial improvement in classification performance.

Keywords/Search Tags:

Ultra high dimensional data, Feature screening, Spherical data, Kernel function, Fusion model

PDF Full Text Request

Related items

1	Research On Feature Selection Of Ultra-high-dimensional Competitive Risk Data Based On Correlation Rank
2	Grouped Feature Screening For Ultra-high Dimensional Data
3	Feature Screening Of Ultra-high Dimensional Random Missing Data Model
4	Some Studies On Feature Screening Of Ultra-high-dimensional Longitudinal Data And Group Structured Data
5	In The Case Of Ultra-high Dimensional Data, The Variable Filtering Of The Model Can Be Added
6	Research On Feature Selection Method Without Model Constraints Under Ultra High Dimensional Data
7	Quantile Feature Screening For Ultra High Dimensional Censored Data
8	Gini-Index Based Feature Screening For Ultrahigh Dimensional Catagorical Data
9	Adaptive Variable Screening For Ultra-High Dimensional Heterogeneous Data
10	Robust Variable Selection And Feature Screening Methodology And Application