Font Size: a A A

Research On Feature Selection Method Based On Evolutionary Computation

Posted on:2020-01-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:T LiFull Text:PDF
GTID:1368330605480335Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Feature selection technology is one of the hotspots in the big data analysis and data mining research field.The proliferation of data brings a new dilemma for feature selection methods.Although feature selection can effectively address high-dimensional data and improve the generalization ability of learning models,the increasing scale of data and the variety of data type structures seriously affect the performance of learning algorithms for data analysis.According to the the importance of information contained in the features,the original features can be divided into irrelevant features,relevant features and redundant features,and the characteristics of the conversion between the relevant features and the redundant features make the task of searching for the optimal feature subset more challenging.The feature selection task is characterized as feature combination optimization problem in the paper,and the evolutionary computation method with good global search and parallel computing characteristics is used to optimize the feature space.It includes designing the individual coding strategies for feature combination optimization,the evolutionary search mechanism,the construction method of optimization objective and the measure metrics of algorithm performance for feature combination optimization.The feature selection problem is studied and analyzed from the aspects of supervised evolutionary computation and unsupervised evolutionary computation.The former mainly studies the feature selection optimization algorithm based on single-objective evolution and multi-objective evolution including classification boundary information,and the binary differential evolution feature selection algorithm based on individual entropy.The latter mainly focuses on the lack of label information problem,and the unsupervised feature selection optimization algorithm based on evolutionary computation and the clustering analysis based on evolutionary computation are studied.The main innovations and specific research contents are:(1)The feature selection algorithm based on genetic optimization of granular information is proposed.A feature selection framework based on granular information is constructed.The classification information of features are analyzed by granulation to measure the quality of feature subsets.From the perspective of information granulation,the feature granulation operator based on a novel binary genetic algorithm and the sample granulation operator based on granularity ? neighborhood rough model are designed.In terms of feature granulation,we designing the granulation mechanism to evaluate the candidate feature subsets,so that the feature granulation algorithm can select important features.In terms of sample granulation,different granularity layers are divided according to the neighborhood radius and the degree of dependency of the decision attribute on the condition attribute under the granularity layer are calculated.In order to study the influence of granularity on feature subset selection further,the granularity optimization based on genetic algorithm is proposed.Its main function is to select reasonable granularity value adaptively,so that the final feature subset is optimal.The experimental results and examples show that the proposed method can significantly improve the classification accuracy of feature subsets.(2)The hybrid feature selection algorithm based on improved multi-objective optimization is proposed.For the problem of poor comprehensive performance of feature subsets selected by single objective optimization algorithm,the conflict relationships of multiple optimization objectives are analyzed.We continue to study the influence of classification boundary information on the important metrics of the neighborhood model,and a new neighborhood model is utilized to calculate the positive domain value.It integrates the classification information contained in the boundary region into the positive region,so that the selected feature subset contains relevant features as much as possible.On this basis,the feature subset size and classification error rate are adopted as the optimization objectives to comprehensively evaluate the quality of candidate feature subsets.Designing the binary coding strategy,and embed the optimization objective in individual code to monitor individual quality in real time,the non-dominant operator is used to obtain the Pareto solution set,and at the same time,the crowded distance between individuals is calculated to ensure the diversity of the evolved population.The feature kernel set is proposed and the intersection information of different feature subsets in Pareto frontier is analyzed.The experimental results show that the proposed method can effectively balance the performance of both the number of features and the classification accuracy,and obtain the better compromise solution.(3)The feature selection algorithm based on binary differential evolution with individual entropy is proposed.In order to study and analyze the influence of diversity and convergence on the optimization of feature subsets in evolution process,an efficient binary differential evolution algorithm is proposed.Firstly,we present the individual entropy and the relationship between individual entropy and population diversity are quantified.And the individual entropy is integrated into the optimization objective function to supervise the change of population diversity in the feature space search process.Then an initialization strategy based on local reverse learning is proposed.It can avoid the non-convergence or premature problem caused by randomness of the population.A discrete mutation operator that satisfies the closed condition is designed and the corresponding sub-operations according to different evolution stages are adopted to ensure the diversity and convergence.Design the discrete crossover operator based on individual entropy,which allows the operator to adaptively select crossover factor according to the feedback of individual fitness.This can reduce the negative effect of subjective factors on the evolution process.The experimental results show that the proposed method significantly reduces the time cost of the evolutionary algorithm under the condition of ensuring good classification performance and feature subset size.(4)The unsupervised feature selection based on differential evolution and its clustering optimization algorithm are proposed.For the problem of lack of tag information to guide the feature subset search,the manifold learning model is introduced to construct a new Laplacian calculation method to describe the internal structure of the data set,and to preserve the neighboring or distant relationship between the original samples.According to the Laplace value measurement of the local retention ability of selected features,an unsupervised feature selection optimization algorithm based on discrete differential evolution is proposed.And the new individual mutation operators and individual crossover operators are given to obtain optimized features subset;In order to verify the quality of selected feature subsets,a clustering optimization algorithm based on continuous differential evolution is proposed.The pattern-based coding strategy is designed to represent individuals in the population,and the closeness and sparsity between samples are used as clustering optimization objective.Then the clustering precision,standard mutual information and adjusting the Rand index are adopted to analyze the clustering results.Compared with the existing unsupervised feature algorithms based on sparse learning,the proposed method can effectively select key features that can retain the internal manifold structure of the data and improve the clustering result.
Keywords/Search Tags:Feature selection, Evolutionary computation, Combinatorial optimization, Neighborhood rough set, Convergence analysis
PDF Full Text Request
Related items