Font Size: a A A

Application Research Of Spark-based Multi-strategy Bat Algorithm In Text Feature Selection

Posted on:2021-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:Q HouFull Text:PDF
GTID:2428330647961537Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the 5G era,coupled with the large number of applications and popularization of Internet of Things devices,the data generated has shown an exponential growth.A large part of the data is stored in a messy text format.How to classify these massive text data through text Technology for classification and display has gradually become the focus of research,and text feature selection is an important part of text classification,which has a direct impact on the speed of text processing and low classification accuracy.Text feature selection uses a specific evaluation function to count,evaluate,and rank features,and then select feature items with larger evaluation values to form a subset smaller than the original text feature space.At the same time,the development and application of big data processing and cloud computing technologies are gradually mature,which can realize the computing and storage requirements of parallel data.Based on the in-depth study of the basic methods and theoretical knowledge of distributed text feature extraction,this thesis optimizes the subset of feature selection and completes the following three tasks:(1)The bat algorithm is used for text feature selection.Use Chi square statistic to reduce the dimensionality of text features.Aiming at the shortcomings of this method,bat algorithm is introduced to filter the features for the second time to reduce the text dimensionality.(2)Bat algorithm parallelization strategy.Because text classification requires an existing data set for model training,when the training data reaches a certain level,the operating efficiency of the original bat algorithm will gradually decrease.This thesis combines the advantages of the Spark parallel computing model based on memory operations and proposes a Spark-based bat algorithm for text feature selection.Through the Spark distributed computing model,the calculation of the algorithm is improved,and the iterative and parallel process of the bat algorithm is published to the Spark cluster,which improves the calculation speed and efficiency of the bat algorithm and saves calculation time.(3)Multi-strategy improvement of bat algorithm.Through analyzing and researching the advantages and disadvantages of traditional bat algorithm,a text feature selection method based on Spark multi-strategy improved bat algorithm is proposed,and the accuracy,recall and value are used for evaluation and analysis.Experiments prove that the improved algorithm improves the accuracy of text classification.
Keywords/Search Tags:Feature selection, text classification, bat algorithm, Spark
PDF Full Text Request
Related items