Font Size: a A A

Research And Implement On The Related Algorithms Of Chinese Text Automatic Classification System

Posted on:2011-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y P LiangFull Text:PDF
GTID:2178330332971973Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Since 1990s, Internet developed vapidly. There're large amounts of information of any field, including text information, sound information, image information and so on. In recent years, how to find the most efficient information from the plentiful and disordered texts has become a target of information processing field. The Text Categorization System based on AI technique can automatically classify the texts according to their senses, thus help people control the information. Text Categorization has gradually been combined with other information processing techniques such as searching engine, information pushing, and information filter, in this way, the quality of information service has been effectively improved. Automatically Text Categorization is the problem of categorizing natural language texts according to given topics, which is a very important problem in natural language processing.In this design, I realized a text Categorization system, which is divided into training module and classification module, In the training module includes the following sections: (1) pretreatment of the Chinese text. (2)Eigenvalue distill.(3) weight calculation, the system was improved in this respect. Classification module is to construct a KNN classifier, and then use it to classify category collection text, please setting parameters before classification. And then to determine the final results of the classification, the process of statistical accuracy. This method not only to achieve 18 species Algorithm combinations, and analyze the best combination of those algorithm to improve accuracy.
Keywords/Search Tags:Vector Space Model, Text Categorization, KNN, Eigenvalue distill
PDF Full Text Request
Related items