Font Size: a A A

The Research Of Text Classification Based On Artificial Bee Colony Algorithm And Improved KNN Algorithm

Posted on:2014-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:X M QiFull Text:PDF
GTID:2248330392960853Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Due to the rapid development of network technology, the text informationgrowth geometrically, people can get more and more of the networkinformation resources. Facing huge quantity of information, peoples’ desireof gaining information rapidly, accurately and comprehensively conflicts withthe complex information resources and explosive growth of all kinds ofdata.As the key technology of processing and organizing huge quantity oftext resources, Text classification can solve the problem of informationclutter efficiently, and has realistic significance in efficient management andeffective use of information resources.Text classification has become animportant research direction in the area of data mining. On the basis ofanalyzing and summarizing text preprocessing, feature selection, textrepresentation model, classification method and the classificationperformance evaluation in text classification, this thesis studied feature selection and classification method deeply.The following work has been done in this thesis:(1)To solve the problem of reducing in classification accuracy caused by thehigh dimension of initial feature space and redundancy of initial feature set, aartificial bee colony feature selection algorithm based on simulated annealingalgorithm comes up in this thesis in order to improve classification accuracyby means of reducing dimensions.In this algorithm, artificial bee colonyalgorithm process is used as the main body and the simulated annealingalgorithm is introduced in order to make up for the shortcomings of artificialbee colony algorithm. This algorithm is proved to be feasible and effective bychoosing appropriate yield function and temperature decreasing function andcomparing with Chi-square, information gain and mutual informationalgorithm through the experimental method.(2)In order to overcome the shortcoming of the traditional KNN algorithm inprocessing large data sets, an improved KNN algorithm based on de-noisingand truncating comes up in this thesis. In this thesis, clustering method isused in training text set to gain the ends of de-noising and the acceleratingspeed of searching the k-Nearest Neighbor is used to improve theclassification effectiveness of KNN algorithm.This algorithm is proved to beefficient in improving the classification effectiveness when KNN algorithm is used to processing large data set and keeping the classification accuracy oftraditional KNN algorithm through the experimental method.The performance of text categorization is improved from different aspectsin this thesis through the study and improvement of feature selection andclassification method in text classification system.
Keywords/Search Tags:Text Classification, Artificial Bee Colony Algorithm, Simulated Annealing Algorithm, Clustering and De-nosing, Density Cuts, K-Nearest Neighbor Algorithm
PDF Full Text Request
Related items