Font Size: a A A

Application Research Of Spark-based Dragonfly Algorithm In Text Categorization

Posted on:2022-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:D Q LiuFull Text:PDF
GTID:2518306722467094Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the Internet era,most people have been used to using electronic documents to replace the previous paper documents,which produces a large number of electronic text information,and most of these text data exist in disordered text format.How to effectively sort out and classify these disordered electronic documents has become a research hotspot.In the process of text classification,text feature selection is a very important link,which directly affects the accuracy of the whole text classification results,so a good feature selection algorithm is very important for the quality of text classification results.This paper mainly studies the dragonfly algorithm,analyzes its advantages and disadvantages,and improves the algorithm according to the shortcomings of the dragonfly algorithm itself.In order to make Dragonfly algorithm efficiently process largescale data text,this paper also realizes the distributed Dragonfly algorithm on spark platform,which improves the efficiency of the algorithm.Finally,it is applied to the feature selection stage of text classification,which makes the second selection of text feature set,and effectively improves the accuracy of text classification.The main work of this paper is as follows:(1)In this paper,the dragonfly algorithm is deeply studied and applied to text classification.On the basis of using chi square statistical method to select text features,the dragonfly algorithm is used to select the feature set twice,so as to improve the accuracy of text classification.(2)Aiming at the low efficiency of dragonfly algorithm in dealing with massive data,this paper implements the distributed Dragonfly algorithm in spark cluster,mainly using the fast computing speed of spark distributed computing framework to improve the efficiency of dragonfly algorithm,proposes a dragonfly algorithm based on spark and applies it to the feature selection stage of text classification.(3)Aiming at the problems that dragonfly algorithm is easy to fall into local optimum and slow global convergence speed when dealing with optimization problems,this paper adds Gauss perturbation strategy to avoid the algorithm falling into local optimum when calculating the optimal solution in each iteration of the algorithm,and makes a change from linear to nonlinear in inertia weight of dragonfly algorithm,which improves the overall convergence speed of the algorithm.Then the improved Dragonfly algorithm is parallelized on spark platform and applied to text classification.Taking the specific experimental data as an example,this paper proves the feasibility and effectiveness of the dragonfly algorithm applied to the feature selection in text classification,and further verifies the effect of the improved Dragonfly algorithm,and realizes the distributed application on the spark platform.Finally,it is applied to text classification to improve the accuracy of text classification.
Keywords/Search Tags:Dragonfly algorithm, text classification, feature selection, spark
PDF Full Text Request
Related items