Font Size: a A A

Based Segmentation Of Chinese Text Automatic Classification And Implementation

Posted on:2003-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhangFull Text:PDF
GTID:2208360092990400Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the development of the Information Technology, especially the popularization of the Internet Applications, information on the Net increases exponentially. How to manage automatically the mass information to keep the volume texts is for the moment the important research task. One method of managing the texts efficiently is to classify them, namely, the problem of Text Classification. Text Automatic Classification is one of the important intelligent information processing, which is of great applications in such fields as news classification, E-conference, E-mail automatic classification and so on.In this paper, the model construction and methods of Chinese Text Classification are analyzed particularly, such as SVM, Boosting, KNN, and so on. Text Classification method requires to solve the problems, such as the obtainment of the training documents, the establishment of the expression modules, the selection of the classification methods, and so on, while classifying the documents.In this paper, the Word Segmentation technology of Chinese Text Classification is debated emphatically. And the method of Word Segmentation based on the phrase labeling of 2-gram syntax is put forward combining the method of setting separate-signs and the method based on the statistic of word-frequency, which can recognize the vocabularies which the method based on the dictionary can not manage. This method is easy to obtain information so that it can break away the independence on the dictionaries and Word Segmentation managing programs, it can replace the mechanical Word Segmentation methods on the dictionaries.Lastly, an automation classifying system is established combining the classifying methods of KNN, Naive Bayes and Simple Vector Space,which validates the efficiency of the Word Segmentation method.
Keywords/Search Tags:Chinese Text Classification, Word Segmentation, Phrase labeling of 2-gram syntax, Information Processing, K-Nearest Neighbor, Naive Bayes, Simple Vector Space
PDF Full Text Request
Related items