Research On Variable Selection Algorithm Based On High-dimensional Complex Data

Posted on:2023-01-15

Degree:Master

Type:Thesis

Country:China

Candidate:A K Guo

Full Text:PDF

GTID:2530307142488964

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

With the development of computer technology and network,the collection and storage of data is becoming more and more convenient.The data set we need to face is from the previous simple feature data set to the data with large-scale complex data sets with large quantity,diverse data structures,high data attributes,high number of dimensions,and strong data correlation.Traditional data analysis methods will face problems such as”dimension disaster” and ”algorithm failure” when facing high-dimensional complicated data.How to deal with high-dimensional complex data simply and efficiently,and accurately and effectively tap the value of the data itself is the hotspot and difficulty of the current research.High-dimensional complex data carry a lot of information.In the process of analyzing and excavating it,it has to face many difficulties in the face of complicated calculation,large calculation,data noise,and large storage occupation.And diminishing dimension is an effective means to solve these problems.The choice of variable selection of high-dimensional complicated data studies in this article.This article first explains the maintenance method as a whole,and then classify and summarize the feature selection method,focusing on introducing the characteristic selection method based on data measurement.The information of the category variable is introduced into the redundant metrics of the selection feature and the selected feature subset,considering the symmetry uncertainty between the characteristics and the category variable,it proposes a new type of information theory-based-based information theory-based basis,Filter feature selection method based on the maximum related minimum redundant principle.In order to explain the performance of the method proposed,it compares it with the results of the six benchmark methods on the common standard data sets of 11 open machine learning,and verify the effectiveness of the proposed algorithm.The results show that compared with the six comparison algorithms,the NMIJMI algorithm proposed in this article has achieved different degrees of classification results on the three different classification results on KNN,SVM,and RF.The results obtained are relatively better,indicating that the new feature selection algorithm can achieve a good classification effect on some data sets,and select a feature set that can improve the performance of the classifier.

Keywords/Search Tags:

High-dimensional complex data, Feature selection, Mutual information, Maximum correlation and minimum redundancy, Conditional mutual information

PDF Full Text Request

Related items

1	Study For Feature Selection Algorithm Based On Maximum Information Coefficient And Redundancy Sharing And MIC Optimization Algorithm
2	Research On Mutual Information Based Feature Selection And Its Application On Metabolomic Data
3	Reconstruction Of Gene Regulatory Networks From Gene Expression Data
4	Feature Selection Based On Frequency Of Mutual Information And Its Application To SNP Association Study
5	Feature Selection Based On Frequency Of Mutual Information And Its Application To Snp Association Study
6	Research Of Gene Network Construction Method Based On Mixed Entropy Optimizing Mutual Information
7	Research On The Risk Measurement Of High-frequency Data Portfolios Based On Mutual Information Vine Copula Model
8	Research Of Multi-label Feature Selection Algorithms In The Form Of Nonlinear Programming
9	Study On Feature Selection And Classification Algorithm For Gene Expression Data
10	High-dimensional LBP Feature Selection Method For Meteorological Cloud Image Classification