Font Size: a A A

Research On Key Technologies Of Chinese Text Categorization

Posted on:2013-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:W Y GaoFull Text:PDF
GTID:2248330395486416Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, text information greatly increases, this makes the automatic text classification technology which is used for processing mass text information become more and more important. Automatic text classification technology is now gradually become the key technology of processing and organizations the mass text information. This paper based on analyzes and summarizes the key technologies, such as text Preprocessing, text representation, feature selection, feature weight calculation methods, the classification methods, evaluation indicator of the classification performance, which involved in the text classification process, made a thorough research on the weight calculation method and KNN text classification algorithm, and then put forward improved methods.Feature weight calculation methods is an important issue of text classification which related to the final classification results, though analyzing the traditional TFIDF weight calculation method in details, contrapose it only consider the feature distribution in the training set when calculate the feature weight, proposed to add two adjustment factor, which representative of the feature distribution between each category, as well as the distribution within each category, into the traditional TFIDF formula.The text classification algorithm performance directly affects the classification results, after a detailed analysis of the KNN text classification algorithm, contrapose it with a large amount of calculation in the classification process, proposed RKNN text classification algorithm which based on reducing the amount of training set, when using RKNN text classification algorithm to categorize the distance between the awaiting sort sample and the training samples, it is not intended to allow all the features to participate in the calculation. Instead, it will select part of the features at a time in accordance with their weight from large to small, and then reduce part of the training samples which furthest from awaiting sort sample in training set, RKNN text classification algorithm guarantee classification performance and reduce the calculated amount as well as the improvement of operating efficiency.
Keywords/Search Tags:Text classification, KNN text classification algorithm, Feature selection, Featureweight
PDF Full Text Request
Related items