Font Size: a A A

Research And Implementation Of Chinese Automatic Text Classification System Based On SVM

Posted on:2011-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:C YanFull Text:PDF
GTID:2178360305471649Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid growth of Internet, there has been a large number of natural language text, how to extract useful information among these text has become a hot issue of our current research, this is also one of the main tasks of the text categorization. The geometric growth of the number of online documents and massive information dissemination in daily life has created the need for automatic text classification. Automatic text classification systems can be used to help people automatically retrieve the text, and determine the type of text.Classification problem is a common problem in practical applications. With the rapid development of information technology, it faces many new puzzles and challenges both in theoretical research and in practical applications. SVM is a new machine learning method developed in the frame work of statistical learning theory, which based on limited information in the model sample complexity and expected to seek the best between the complexity of machines and empirical risks. Compared with traditional learning methods, SVM holds the advantages of not sensitive to the dimension, the convergence to the global optimal, the advantages of generalization ability. It has a better solution to the traditional method that often appear in the curse of dimensionality, local extremum, than learning the thorny issue of machine learning in recent years become a very active area of research focus.This paper firstly introduce the research status of text categorization, secondly we study and discuss the key technique of text categorization, including the process of chinese text classification, chinese word segment, feature selection, feature weight and several commonly used classification algorithm. Thirdly, briefly introduced the SVM theory, including the statistical learning theory, optimal separating surface of SVM, the classification of all cases, the kernel function of SVM and classification step etc. Then construct a SVM classifier, introduced the frame, system flaw and function module of Chinese text categorization system. Finally, we list the result of experiments based on the different algorithms of classification, the focus is on the result of experiments on different kernel functions in the different feature extraction function.
Keywords/Search Tags:automatic chinese text classification, support vector machine, chinese word segmentation, feature selection and extraction, categorization algorithm
PDF Full Text Request
Related items