Font Size: a A A

Research On Heuristic Hybrid Feature Selection Method For Structured Data

Posted on:2020-10-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y F ZhengFull Text:PDF
GTID:1368330575481198Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advancement of society and the rapid development of science and technology,large amount of data has been generated in the fields of people's life and production.The requirement of eliminating redundant data is achieved by reducing the dimension of data collection.There are two common dimension reduction methods: feature extraction and feature selection.Feature selection is widely used in the field of finding the optimal feature subset because it the does not change the information of the original feature.The basic methods of feature selection are the wrapper feature selection method and the filter feature selection method.The wrapper feature selection method has the characteristics of high classification accuracy and high time complexity.The filter feature selection method has the characteristics of low time complexity.As people have higher requirements for classification accuracy,one feature selection method can't meet the requirements.The hybrid feature selection method which is combination of the two methods provides a new solution.In the application process,the hybrid feature selection method has four problems: the classification accuracy rate is to be improved,the data is high-dimensional,the candidate feature subset is single,and the correlation and redundancy are the same.In order to solve the above problems in the hybrid feature selection method,three hybrid feature selection algorithms are proposed in this paper.The specific content of each algorithm is as follows.(1)For the first and fourth problems,the Maximum Spearman Minimum Covariance Cuckoo Algorithm(MSMCCS)is proposed by using the embedded feature selection method.First,based on Spearman and covariance,a filter algorithm was proposed,which is called Maximum Spearman Minimum Covariance(MSMC).Second,three parameters are introduced in the MSMC to dynamically adjust the weights of relevance and redundancy,which can improve the correlation of feature subsets and reduce the compatibility.Third,in the improved CS algorithm,the improved position update stagey increases the convergence speed of the algorithm.The candidate feature subset is selected by the weight combination strategy,and the candidate feature subset is adjusted by the cross-mutation idea.Finally,the filtered features have the opportunity to enter the optimal feature subset,which improves the classification accuracy.The experimental results show that the proposed algorithm has a fast convergence rate and the classification accuracy is significantly better than the other 10 algorithms.(2)In order to solve the high-dimensional problem of microarray data,a parallel hybrid feature selection method is proposed,which called K-value Maximum Relevance Minimum Redundancy Improved Grey Wolf Optimization(KMR2IGWO).First,K optimal genes are selected in the data set according to the Maximum relevance Minimum Redundancy algorithm.Second,the data set consisting of K genes are initialized by two methods: random selection and different ratio selection features.Finally,by adjusting the parameters of the fitness function and changing the strategy of location update,the combination of genes with the best classification accuracy and the shortest length was selected.The experimental results show that the proposed algorithm has a significant effect on dimensionality reduction on 14 data sets,and the number of features is reduced to 0.4% to 0.04%.(3)In order to solve the third and fourth problems,a threshold-adjusted parallel hybrid feature selection algorithm called the Maximum Pearson Maximum Distance Improved Whale Optimization Algorithm(MPMDIWOA)is proposed.First,based on the Pearson correlation coefficient and the correlation distance,a filter algorithm is proposed,which is called the Maximum Pearson Maximum Distance(MPMD).Two parameters are introduced in the MPMD to adjust the weight of correlation and redundancy.Second,in the whale optimization algorithm,the voting method is used to jump out of the local optimum.Third,an initialization method called Alternative Two Lose One(ATLO)is proposed.Fourth,the concepts of Maximum Value without Change(MVWC)and Threshold are proposed.By adjusting the threshold,the filter algorithm provides multiple candidates feature subsets.The parcel algorithm finds the best classification accuracy in multiple candidate feature subsets.The experimental results show that MPMDIWOA algorithm has higher classification accuracy than other 4 algorithms on most data sets.In summary,this paper takes the wrapper and filter feature selection method as the basis,the hybrid feature selection method as the research content,carries research on the generation of candidate feature subsets and the selection of optimal feature subsets.
Keywords/Search Tags:Feature Selection, Wrapper Algorithm, Filter Algorithm, Heuristic Algorithm, Hybrid Algorithm, Dimension Reduction Technology
PDF Full Text Request
Related items