Font Size: a A A

Research On Text Classification Based On Multi-class Soft Interval Support Vector Machines

Posted on:2009-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:G Q TanFull Text:PDF
GTID:2178360245986554Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Text classification is one method of managing the texts efficiently, it's one of the important intelligent information processing. Text classification is very helpful for efficiency and effectiveness of information retrieval, so good classification performance is the focus.Support Vector Machine is a new machine learning algorithm, it has the good extension and the better classified accuracy. It has become the new research hotspot after the research of pattern cognition and artificial nerve net. So text classification algorithm based on Support Vector Machine is a hotspot in recent years.The thesis aims at applying Support Vector Machine and kernel function to text classification. This paper firstly discussed general development and some techniques of text classification. Then discussed the Statistical Learning Theory and Support Vector Machine, especially kernel function, then systematically studies signal analysis techniques of text classification based on SVM and Statistical Learning Theory.The main work includes in the thesis is: in depth discussion with Support Vector Machine, got a model of soft interval SVM, called C-SVC. By the researched of C-SVC, we defined a parameters called s, constructed a new model of soft interval SVM, we called it s-SVM. According to the theory of kernel function, we gave the selection strategy of support vector ratio kernel function which has good results fot s-SVM with the thinking of polynomial kernel and radial basis kernel. The paper sloved multi-class classification in s-SVM with paired category strategy, give the algorithm steps of multi-class classification based on the model of s-SVM. This thesis constructed a system of text classification, the core module of this classification system use C-SVC and s-SVM. We use the corpus in Reuters-21578 to test this artificial, get the performance of C-SVC and s-SVM, and compare them with traditional artificial of text classification, we analysis the advantages and disadvantages of this artificial.
Keywords/Search Tags:text classification, support vector machine, statistical learning, soft interval, multi-class classification
PDF Full Text Request
Related items