Font Size: a A A

Research On Feature Selection Algorithm In Data With Large Scale And High Dimension Based On Evolutionary Multi-Objective Optimization

Posted on:2020-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:W GuoFull Text:PDF
GTID:2428330575965394Subject:Engineering
Abstract/Summary:PDF Full Text Request
Feature selection is one of the important data preprocessing step for machine learning and data mining tasks.Its purpose is to select relevant feature subsets from the original features to obtain better performance,lower computational cost and better model interpretability.However,in practical applications,such as information retrieval and biometric identification,the data with large scale and high dimension presents serious challenges to existing feature selection methods.The existing feature selection algorithm weights the feature number and the accuracy of the selected feature subset as a target,which is solved by the gradient method.Although good results can be obtained,a priori knowledge is required,and the optimization function is required to be convex and smooth.In this thesis,the number of features and the accuracy on the subset of features are taken as two independent objectives solved by a multi-objective manner,and an evolutionary computation technique with good global search ability is used as an optimization tool.In this thesis,two feature selection algorithms are proposed to solve the feature selection problem under data with large scale and high dimension.The main work of this thesis includes the following two parts:(1)A feature selection problem with large scale.This problem is computationally expensive because of the huge amount of data.In this thesis,the feature selection algorithm MOFSRank based on evolutionary multi-objective optimization is proposed by taking the Pairwise ranking problem of training data size 0(n2).The algorithm includes three strategies:1.Multi-objective instance selection strategy:The strategy chooses some informative instance subsets from the training set to removes the possible noisy data in the original set.It gives the subsequent feature selection a small number of training data with informative instances;2.Multi-objective feature selection strategy:The strategy is executed based on the above work,in order to further improve the performance of the algorithm in feature selection,an adaptive mutation probability is adopted to obtain a subset of features with high ranking accuracy and low redundancy.3.Pareto ensemble strategy:This strategy adopts a mixed coding,where the Pareto solutions in the second phase are utilized to produce a better feature subset as the final output.The experimental results on the data set with large scale show that the MOFSRank algorithm can achieve good ranking results with fewer features selected.(2)A feature selection problem with high dimension.The existing evolutionary computation methods solve this problem,because of the huge search space,a large number of evaluation times are required,so that the optimization process pays a huge computational cost.In this thesis,a guidance model algorithm GMA based on evolutionary multi-objective optimization is proposed for this problem.The algorithm includes two strategies:1.Fast zeroing strategy:This strategy can quickly eliminate irrelevant and redundant features and reduce the search space.The experimental results show that the higher the data dimension,the more obvious its advantages;2.the guidance model pre-screening strategy:the strategy uses the historical functional fitness value training guidance model,and uses the guidance model to help screen the population individuals,so reduce the number of real evaluations and speed up the algorithm search.The experimental results on high-dimensional datasets show that the GMA algorithm can obtain better feature subsets in high-dimensional datasets with lower computational cost.
Keywords/Search Tags:Feature selection, Large scale, High dimensional data, Evolutionary multi-objective optimization
PDF Full Text Request
Related items