Font Size: a A A

Research On Chinese Text Classification System Based On Support Vector Machine

Posted on:2008-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:C ChenFull Text:PDF
GTID:2178360215474272Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the rapid development of information society, especially with global popularization of the World Wide Web, the information continues to increase explosively. On one hand, people take advantages of the large amount of information; on the other hand, there is a growing need for tools to help people better to find useful information in those tremendous amounts of information for the reason that it is difficult to get the useful information from the redundant parts manually. As the key technology in organizing and processing large mount of document data, text classification can solve the problem of information disorder to a great extent, and is convenient for user to find the required information quickly. So text classification can be applied broadly in future.Support Vector Machine (SVM) is a new machine learning method developed in recent years based on statistical learning theory. Compared with traditional methods such as Artificial Neural Networks (ANN) and Genetic Algorithms (GA), it has shown many good performances in solving problems of small samples, nonlinear and high dimensional pattern recognition. At present, SVM becomes the new research focus in the field of machine learning and be applied successfully in many fields such as face recognition, text classification and biological information processing. Chinese text classification based on SVM is studied in this thesis. The main research works of this thesis are listed as below:1. The key technologies of text classification, such as Chinese word segmentation, feature selection, weight computation and different classification algorithms are analyzed in this thesis. Feature selection is one of the most important sections in text classification, so emphases are put on it. Classification performance with different feature selection are compared and discussed in this thesis.2. Classifier is important in text classification and SVM is used as the classifier in this thesis. SVM is introduced detailedly in this thesis from the basic model of machine learning to its elementary principle and different algorithms. The classification results with different kernel functions are compared and discussed. 3. Choosing parameters is difficult when SVM is used as classifier in Chinese text classification. So GA-SVM algorithm is put forward to solve this problem, which is used in Chinese text classification combining with the intelligent search characteristic of genetic Algorithms (GA) and good classified performance of SVM in this thesis. The elementary theory, work flows and the key technologies of GA-SVM are also introduced in this thesis. The experiment shows that it has a good learning ability and classified performance.4. A Chinese text classification system is designed and implemented by SVM classifier. Experiment results show that it has a good classified performance and can applied practically.
Keywords/Search Tags:Text classification, Feature selection, Parameter selection, GA-SVM
PDF Full Text Request
Related items