Font Size: a A A

A Study Of Algorithms For Text Categorization Based On Reducing Class And Fuzzy Theory

Posted on:2012-06-28Degree:MasterType:Thesis
Country:ChinaCandidate:H B CengFull Text:PDF
GTID:2178330335963932Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text Categorization has become a key technology for vast text information processing and has gradually been an important research direction in the field of data mining. KNN algorithm for text categorization is one of the better performance automatic text categorization techniques. Meanwhile, it also has its own deficiency. Firstly, KNN categorization algorithm is inefficient. It delays all calculations to classification stage. And for each categorization, KNN algorithm has to compute similarity with all training data. As a result, the pace of categorization has been slowed down. Secondly, the performance of KNN algorithm deteriorates when distribution of training data is skewed among different classes. If the text number of a class is much larger than others'in the train set, categorization result will tent to the larger one. For such two shortcomings, we proposed the corresponding solutions. Firstly, we proposed an algorithm based on reducing class, so as to improve the performance of KNN text categorization. The algorithm divide the process of categorization into a few steps, for each step some classes have been cut off through rapid categorization until the last remaining class so as to get the final result. Secondly, fuzzy theory was applied to the KNN algorithm to create a membership matrix for training text set and category sets, which is used to calculate the class attribute of test text. The balance factor is imported to the membership matrix to reduce the negative impact brought by different number of class text. The experimental results showed that the proposed schemes achieved the desired results. The former proposed method can significantly reduce the categorization time while not affecting the classification results, while the latter can obviously improve the performance of KNN text categorization especially in processing the imbalance class.
Keywords/Search Tags:Reducing Class, Fuzzy Theory, Text Categorization, KNN, Class Imbalance
PDF Full Text Request
Related items