Font Size: a A A

Research On Optimization Methods For Feature Selection

Posted on:2018-03-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y TengFull Text:PDF
GTID:1318330542991531Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous generation and accumulation of data in various fields,massive data bring new challenges to machine learning and data mining tasks.How to remove irrelevant and redundant data,and use the relevant data to get valuable information becomes the focus of the current study.Feature selection is a preprocessing of data mining,machine learning,pattern recognition,and so on.The main purpose of this technology is to reduce the scale of the data significantly,and maintain information expression of the original feature set.According to samples distribution of data,the corresponding feature evaluation method is used to select an optimal or approximate optimal feature subset instead of the original feature set in feature selection.The feature subset obtained by feature selection will improve the efficiency of data processing,maintain the correctness of recognition and strengthen the analysis result of information.Through a more stable expression of features,it is easier to find the intrinsic link between the subjects.Feature selection can be regarded as a discrete combinatorial optimization problem,so we use information theory and evolutionary algorithms to optimize the selection process of feature subsets in this paper,and study the two aspects of feature selection: the optimization of search strategy and the optimization of evaluation method.This work focuses on the combined effect of feature subset.This thesis proposes two different kinds of evolutionary fearture selection optimization algorithms for search strategy.A new evaluation method is proposed for feature subset evaluation,which is effective for both of single-label feature selection and multi-label feature selection.The main contribution of this paper and research content are as follows:(1)This paper proposes an optimization method on the search strategy of single-label feature selection based on correlation information entropy.Since the greedy search can not obtain the globally optimal feature subset,genetic algorithm is used to search in feature space.The evolutionary search obtains the adaptive feature subset without the manual fixed scale of subset.In order to prevent the evolutionary search from falling into the local optimum and avoid amplifying the scale of subset,two kinds of mutation strategies suitable for feature selection are proposed.The dynamic mixed mutation strategy is formed by combining thesingle point mutation strategy.Finally,a linear discriminant analysis is used to evaluate the feature subset as a whole.This part realizes the dynamic adaptive evolutionary search method to evaluate the combined effect of subset.This optimization focuses on the global search of feature space.(2)This paper proposes an optimization method on the evaluation of single-label feature selection based on correlation information entropy.In the feature selection,the classical information theory is often used to evaluate the feature correlation and redundancy separately.The overall evaluation of the combined effect about the feature subset is lacking.In order to overcome the above shortcoming,the theory of correlation information entropy in multi-sensor information system is mapped into the feature selection space.The independence and redundancy degree of the interior about the feature set are evaluated based on this theory.The correlation matrix of features in subset is calculated according the mutual information between features and class.The advantage of the method is that the multivariate relationships among different features in subset are fully considered when the correlation information entropy is calculated.The evaluation method can be used as a feature sorting algorithm or an adaptive feature subset selection method combined with the control parameter of redundant information.This optimization focuses on the information measure of the combination effect of the feature subset as an integral element.(3)This paper proposes an adaptive feature selection method based on V-shaped binary particle swarm optimization.The search speed in feature space is also one important index of feature selection.Aiming at the speed advantage of binary particle swarm optimization in evolutionary search.This work combines the advantages of the evolutionary algorithm search strategy and correlation information entropy evaluation,which improves the time efficiency of evolutionary search in large scale data set,and completes the overall combined evaluation of the subset.The evolutionary search of V-shaped binary particles is superior to the greedy feature selection method when using the same measurement.This optimization focuses on the cooperation of the global search strategy and the measurement of combination effect to verify the better performance of the two combined optimization methods.(4)This paper proposes an optimization method on the evaluation of multi-label feature selection based on neighborhood correlation information entropy.In the multi-label feature selection,this thesis measures the overall performance of features nder different labels.Unlikethe single-label feature selection,the discretization of continuous data under multiple labels loses a lot of information about the feature combination.Therefore,the neighborhood information entropy is used to calculate the amount of information under each label.Based on the calculation of neighborhood information entropy,the neighborhood information matrix of features is constructe.the correlation information entropy is used to measure the whole subset.This part of the work avoids the limitation of selecting each label as a single label feature selection.It is a new solution method of multi-label feature selection problem.The evaluation methods of correlation information entropy and neighborhood correlation information entropy have generality and effectiveness in both of the single label feature selection and multi-label feature selection.This optimization is a new evaluation method of feature subset in multi-label feature selection.
Keywords/Search Tags:feature selection, evolutionary search, combinatorial optimization, combined effect, correlation information entropy
PDF Full Text Request
Related items