Font Size: a A A

Research On Variable Selection Algorithm Based On High-dimensional Complex Data

Posted on:2023-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:A K GuoFull Text:PDF
GTID:2530307142488964Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the development of computer technology and network,the collection and storage of data is becoming more and more convenient.The data set we need to face is from the previous simple feature data set to the data with large-scale complex data sets with large quantity,diverse data structures,high data attributes,high number of dimensions,and strong data correlation.Traditional data analysis methods will face problems such as”dimension disaster” and ”algorithm failure” when facing high-dimensional complicated data.How to deal with high-dimensional complex data simply and efficiently,and accurately and effectively tap the value of the data itself is the hotspot and difficulty of the current research.High-dimensional complex data carry a lot of information.In the process of analyzing and excavating it,it has to face many difficulties in the face of complicated calculation,large calculation,data noise,and large storage occupation.And diminishing dimension is an effective means to solve these problems.The choice of variable selection of high-dimensional complicated data studies in this article.This article first explains the maintenance method as a whole,and then classify and summarize the feature selection method,focusing on introducing the characteristic selection method based on data measurement.The information of the category variable is introduced into the redundant metrics of the selection feature and the selected feature subset,considering the symmetry uncertainty between the characteristics and the category variable,it proposes a new type of information theory-based-based information theory-based basis,Filter feature selection method based on the maximum related minimum redundant principle.In order to explain the performance of the method proposed,it compares it with the results of the six benchmark methods on the common standard data sets of 11 open machine learning,and verify the effectiveness of the proposed algorithm.The results show that compared with the six comparison algorithms,the NMIJMI algorithm proposed in this article has achieved different degrees of classification results on the three different classification results on KNN,SVM,and RF.The results obtained are relatively better,indicating that the new feature selection algorithm can achieve a good classification effect on some data sets,and select a feature set that can improve the performance of the classifier.
Keywords/Search Tags:High-dimensional complex data, Feature selection, Mutual information, Maximum correlation and minimum redundancy, Conditional mutual information
PDF Full Text Request
Related items