Font Size: a A A

Research On Chinese Text Categorization Based On The Integrated Support Vector Machine Method

Posted on:2015-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:P L YouFull Text:PDF
GTID:2308330452456884Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development computer networks and information technology, the datascale is becoming more and more considerable in every walk of life,massive growth ofinformation makes it difficult for people to find what they need. Text classificationtechnology not only help people find what they need quickly, but also bringing chaos toorder on network. therefore, experts and scholars of natural language processing andcomputer attaches great importance to text classification.The main task of text classification is classify the text to a category according the textcontent automatically. Naive Bayes, SVM and K nearest neighbor classification are allcommon text classification algorithm, support vector machine classification is one of thebest text classification algorithms among them. In order to training a classifier with greatperformance, there are two methods. One is to optimized the parameters of the trainingmodel, produce a single classifier with high accuracy rate, the other one is to combinemultiple classifiers to a integrated classifer. And the integrated one performs better thanany other single classifer who build it.The main research work as follows:(1) Discussed the whole process of Chinese text classification, involving textsegmentation, delete stop words, text representation, weight calculation, feature reduction,the basic theory of common classification algorithms such as Naive Bayes, K nearestneighbor and support vector machines, and Evaluation of the performance of textclassification.(2) Research on the support vector machines and ensemble learning theory,andintroduce two classic ensemble learning:Bagging and Boosting.(3) Compare the SVM with other common classification algorithms, and make acomparative study of the performance of SVM use differe nt kernel function andBagging and Boosting algorithms based on support vector machine. Finally, propose avoting algorithm based on SVM-Bagging that use different kernel function whosignificantly improves the classification accuracy.
Keywords/Search Tags:Chinese text classification, Feature dimension reduction, Support vector machine, Ensemble learning
PDF Full Text Request
Related items