The extensive attention and research have been paid to feature selection as one of the research hot spots in the field of machine learning.Over the years,we can see the growth of the number of features of data sets in machine learning tasks,while a large number of irrelevant and redundant features among them still exist,so great difficulty will be caused to data analysis.As a result,feature selection of data sets,especially for high-dimensional data sets,is very necessary in machine learning tasks.The feature selection aims at reducing the dimension of data and improving performance,and the irrelevant and redundant features can be effectively eliminated.As a matter of face,feature selection is a combinatorial optimization problem,which is the difficult problem of NP.In the process of optimizing problems,the stochastic algorithms have achieved good results.When it comes to the naturally inspired meta heuristic algorithm,it is the most widely used algorithm in the optimization of problems.As a kind of stochastic algorithm,swarm intelligence optimization algorithm has become the main technology to solve global optimization of problems due to the characteristics of simplicity,flexibility and efficiency.The introduction of randomness in the search process is the main feature of swarm intelligence optimization algorithm,which is different from the deterministic method,since it is easy to fall into local optimal solutions in complex situations.Therefore,the latest swarm intelligence optimization algorithm will be studied in this thesis,which will be also used in feature selection problems.Since the naturally inspired meta heuristic algorithms applied to feature selection problems have many types,they have shortcomings though they all have their advantages.For the purpose of having a deeper understanding,some typical swarm intelligence algorithms proposed at home and abroad over the years are selected in this thesis,such as bat algorithm,gray wolf optimization algorithm,dragonfly algorithm,whale optimization algorithm,locust optimization algorithm and sparrow search algorithm.22 standard CEC test functions are used to make the contrast from the convergence speed,accuracy and stability,the refinement analysis is conducted,and the related improved methods are compared.Finally,it is found that the performance of sparrow search algorithm is more outstanding.Proposed in 2020,the sparrow search algorithm has the features such as high search accuracy,fast convergence speed,good stability,strong robustness and so on,which is suitable for problems with continuous search space.At present,it has not been applied in feature selection problems yet.Due to the fact that the algorithm has strong global search ability,a sparrow search binary algorithm is proposed in this thesis to select the optimal feature subset for the purpose of enhancing the performance of feature selection algorithm.To transform the sparrow from a free position to the corresponding binary solution,the S-shaped transformation function is used in each dimension to transform the elements of the position vector only into numbers in the(0,1] interval to suppress the continuous update of the sparrow’s position.Then,the random threshold is adopted to determine the updated binary sparrow search position.To look for the optimal solution,SVM classifier is used to make the calculation of the fitness value of sparrows,in which the classification and evaluation of its feature subsets are conducted.To verify the effectiveness of the proposed algorithm,the contrast of the binary sparrow search algorithm with four wrapped feature selection methods is carried out.The feature selection on 12 different data sets of UCI is made by using the several performance measurement methods covering classification average accuracy,fitness value,size of selected features and calculation time.The statistical significance test is conducted with the Wilcoxon signed-rank test.From the experimental results,it indicates that the method has strong competitiveness in the aspects such as classification average accuracy,computation time and optimal feature selection. |