Research On Text Classification Based On Firefly Algorithm And Improved KNN

Posted on:2021-03-10

Degree:Master

Type:Thesis

Country:China

Candidate:C Zhao

Full Text:PDF

GTID:2428330614958341

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development of information technology,today's network users are not only information consumers,but also information producers.The network is full of a large number of disordered information in the form of text.In the face of massive data,it is difficult for users to find valuable information for themselves.Text classification is the key technology to solve this problem,which can effectively organize and manage the text data on the network.However,the current text classification technology has some problems,such as low accuracy of feature subset,high dimension and low classification efficiency.In order to solve these problems effectively,this thesis mainly improves the research from the following two aspects:1.Aiming at the problem that the accuracy of feature subset obtained by traditional feature selection method is not high,a text feature selection model based on information gain and firefly algorithm is proposed.Firstly,the information gain method is used to select the feature pre selection set with large information gain value from all feature words,and then the firefly algorithm is used to search for a better feature subset on the set.In order to improve the slow convergence speed and easy to fall into local optimum of firefly algorithm,a dynamic update step factor is introduced.In the early stage of algorithm search,the step size is relatively large,which can make a good global search;in the later stage,the step size gradually decreases with the increase of iterations,which can ensure the local search performance of the algorithm and quickly reach the global optimum.The experimental results show that the accuracy of the feature subset selected by the improved algorithm combined with information gain is higher than that of the original algorithm and information gain.The feature selection model can effectively improve the accuracy of text classification.2.In order to solve the problem of low classification efficiency when k-nearest neighbor algorithm is faced with a large number of training samples,a fast k-nearest neighbor classification algorithm based on clustering and central vector is proposed.Firstly,the training texts of each category are clustered by clustering method.Then,the texts of each category are divided into inner region and boundary region,and the center vector is calculated.When the text to be tested is classified,the decision can be made quickly according to its distance from the center vector and the average distance withinthe class.If not,the distance between the text to be tested and the center of each cluster can be calculated.The training sample subset is composed of all the texts in the cluster that are relatively close to it.Finally,the k-nearest neighbor algorithm is used to make the classification decision on this subset.The experimental results show that the performance of the improved algorithm is similar to that of the traditional k-nearest neighbor algorithm,but the classification time is significantly reduced,which can effectively improve the efficiency of text classification.

Keywords/Search Tags:

text classification, firefly algorithm, feature selection, clustering, central vector

PDF Full Text Request

Related items

1	Chinese Text Classification Based On Svm Algorithm Realization
2	Research Of Feature Selection And Weighting Algorithm In Text Classification System Based On SVM
3	Research On SVM And Text Classification
4	Text Classification Method Based On Unsupervised Clustering And Naive Bayesian Classifier
5	The Study On Feature Selection Algorithm In Chinese Text Clustering
6	Research On Feature Selection Of Text Classification
7	Research On Chinese Text Classification Based On Support Vector Machine
8	The Design And Application Of SSVM's Text Classification Based On Feature Selection Optimization
9	Research On Improvement Of Chi-square Feature Selection And Word Vector Text Representation For News Classification
10	Research Of Web Text Classification Algorithm Based LSI And SVC