| The technology of text similarity calculation has become mature.After text classification,however,there is no excellent method to calculate the similarity between different types of text.Feature selection is a typical method of text classification and swarm intelligence optimization algorithms are widely used in feature selection because of their abilities and high efficiencies to process high-dimensional data.Therefore,how to make swarm intelligence optimization algorithm carry out feature selection is the key technology to solve text feature selection.The traditional text classification methods mainly rely on manual,and have the defects of low efficiency and high cost.How to efficiently and accurately realize text classification is a hotspot of current research.In this paper,a feature selection algorithm based on binary cuckoo algorithm is proposed in combination with the characteristics of swarm intelligence optimization algorithm.At the same time,in order to fully consider the similarities and differences between sentences,a sentence similarity calculation method based on modifiers is proposed.The main works of this paper are as follows:(1)Aiming at the inherent defects of traditional binary cuckoo algorithm in late period,such as slow convergence speed and low search accuracy,this paper proposes an improved binary cuckoo algorithm,which is applied to high-dimensional text features for feature selection,and then uses the selected low-dimensional features for text classification.Firstly,the precision control coefficient was introduced to control the population updating mode of cuckoo algorithm.According to the calculation accuracy,whether the population adopts Levy flight or Cauchy flight was determined,so as to improve the convergence speed and optimization accuracy of the population at the later stage.Secondly,in order to overcome the shortcomings of cuckoo algorithm in the lack of information exchange between populations and enhance the utilization rate of information,the mutation crossover in differential evolution algorithm is introduced to improve the information interaction and diversity between populations.(2)In order to calculate text similarity,this paper proposes a method to measure Chinese sentence similarity based on Language Technology Platform(LTP)and Word2 Vec.In order to obtain the difference between structures,the sentence is divided into subject-verb-object structures by using syntactic analysis tool,and the longest common substring is removed from the modified parts of each structure,and word vector is generated by Word2 Vec to calculate the similarity.The experimental results show that the improved binary cuckoo algorithm compared with the traditional binary algorithm,in the late iteration convergence speed and precision,optimization and the improved binary cuckoo search algorithm used in feature extraction,the experimental results show that the improved binary cuckoo search algorithm can efficiently achieve dimension reduction of high dimensional data;At the same time,using the improved method,the similarities and differences of the main text can be accurately obtained,which makes the result of sentence similarity calculation more accurate. |