Font Size: a A A

Research And Improvement Of Feature Selection Based On Cuckoo Search Algorithm

Posted on:2020-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y WangFull Text:PDF
GTID:2428330575981211Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the era of artificial intelligence,feature selection is an significant of preprocessing for big data.Feature selection could avoid dimensional disasters,reduce the time of learning algorithms in the execution process,effectively prevent over-fitting,and filter out noise data.In today's huge amount of data,we need to find some useful data for us to train or learn from the huge amount of data,so feature selection is undoubtedly worth studying and discussing.Feature selection is a process that selects high-quality features from a large data set,so it can also be understood as a search process.However,if we use the exhaustive method to select features,it obviously loses the meaning of feature selection.The random algorithm is a kind of optimization solution,which can be well applied to feature selection.Random algorithms generally include swarm intelligence algorithms,such as ant colony and particle swarm optimization algorithms,simulated annealing algorithm and more.The cuckoo algorithm proposed in recent years is also a widely used random algorithm.The cuckoo search algorithm has achieved good results in the optimization problem,so we apply the discretized cuckoo algorithm to the feature selection problem and improve it.Recent studies have shown that the Optimized Cuckoo Feature Selection Algorithm(BCS)has a good classification effect.However,the BCS algorithm also has some shortcomings.First,the randomness of the initialization of the BCS algorithm leads to the blindness of the algorithm,and the subsequent processes of the algorithm rely on the initialization process.So if the quality of the initialization is not good,it will seriously affect the effect of the update iteration process of the algorithm.Second,the limitation of fitness function in BCS algorithm severely limits its classification performance and dimension reduction ability.Third,the quality features in the iterative process are not preserved in the next iteration.Therefore,three improvements are proposed in accordance with the disadvantages of the BCS algorithm described above.First and foremost,a new initialization strategy is reconstructed based on the characteristics of the high-quality sequence obtained by the chaotic sequence.Since there are many kinds of chaotic sequences,different kinds of chaotic sequence mappings have different properties.We test different chaotic sequence mappings and judge which chaotic mappings are more suitable for the initialization of the cuckoo search algorithm based on experimental results.We test Logistic chaotic maps,Tent maps and Chebyshev maps which have performed well in recent years.It is concluded that Chebyshev maps are more suitable for the initialization process of cuckoo search algorithm.The Chebyshev chaotic map not only reduces the randomness,but also increases the convergence speed of the algorithm.The internal structure of the cuckoo nest initialized by Chebyshev chaotic map is also very refined,which is more conducive to the update of the later algorithm.Besides,the fitness function in the original BCS algorithm is too simple.We use the information gain to measure the classification accuracy of the classifier,measure the degree of dimensional reduction with the L1 norm,and rewrite the fitness function of the BCS algorithm to form a new one.Finally,we hope that the high-quality features in the iteration can be retained until the next iteration,which can reduce the search useless space and improve the convergence speed of the algorithm.So,we can optimize the iterative process by using the AND operation to get high quality features and using the OR operation to add to quality features to the sequence.A new feature selection algorithm FS_CSO is constructed through the above three points.In the experimental phase,FS_CSO uses KNN,J48 and SVM classifiers to guide the learning process and test it through small,medium and large data sets on the UCI dataset.The experimental results show that compared with BCS,FS_CSO significantly improves the classification performance and dimensional reduction ability.Comparing the FS_CSO algorithm with the more efficient feature selection algorithm proposed in recent years,FS_CSO is highly competitive,both in accuracy and in dimensional reduction.
Keywords/Search Tags:FS_CSO, fitness function, initialization, update mechanism, feature selection
PDF Full Text Request
Related items