Font Size: a A A

The Research And Implementation Of Chinese Text Categorization

Posted on:2003-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q DouFull Text:PDF
GTID:2168360062475112Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Text categorization can provide information retrieval more efficient searching strategies and good query results. With the rapid growth of the information resources on Internet. it has become more and more important for information processing.With the technique of information retrieval, natural language processing and the idea of pattern recognition, this paper put the emphasis on the study on Chinese text features selection and text learning methods. Based on two Chinese text features catching methods: the technique of dictionary-based words segment and the technique of n-gram-based feature catching, the paper proposes a new feature catching method which combines the two methods above together and provides abundant text feature for text categorization system. The experiments show that the categorization system can achieve a good performance with the features obtained through the new method. Then we have a father research on the redundant features getting by the method, and the experiments show that the redundant features are worthy feature for text categorization. We get a conclusion that it is not necessary to pursue content text features catching especially in the text categorization based on statistics. At last, after analyzing both characters of text learning and the learning method: support vector machine, we proposes a method to modulate the output of the classifier by the misclassified training texts. The experiments show that the method improves the system effectively and satisfactory categorization effectiveness has been achieved.
Keywords/Search Tags:Text categorization, Information retrieval Vector space model Text feature Support vector machine
PDF Full Text Request
Related items