Font Size: a A A

Research And Implementation Of Key Technologies In Chinese Text Classification

Posted on:2010-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z L HeFull Text:PDF
GTID:2178330332488562Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularization of information process techniques and computer network, amount of text information data has been processed, especially the Chinese text information data is rising at the rate of index. In order to quickly and easily to deal with this information, text classification technology came into being, and has become a research hotspot in the field of text data mining.Research on foreign and domestic automatic classification techniques, this dissertation discusses several critical techniques, from automatic achieving of text classification knowledge to design of classifier, which affect the results of classification in a Chinese text classification (CTC) system. Then it makes a deep research on how to improve precision and speed of the CTC system, at the same time guarantees the system has a strong stability. Finally the CTC system is implemented, which uses the improved vector space model to do feature representative. Through the research of text feature selection methods, a new combined feature selection method is proposed. In the aspect of desiging classifier, an improved Naive Bayes classifier which using the improved independent component analysis algorithm enhances the classifier's performance. Then, in order to improve the system performance, this dissertation creats a structure of multiple classifiers enginee, which integrated SVM and the inproved Bayes classifers, and gets better classification performance than any other single classifier. Through many testing experiments and a statistical analysis of the experiment's results, it is proved that the Chinese text classification method mentioned in this thesis can achieve the goals which are listed above.In view of the above research results, we describe design and realization details of the prototype system.
Keywords/Search Tags:Chinese Text Classification, Vect Space Model, Bayes Theory, Multiple Classifiers Combination
PDF Full Text Request
Related items