Font Size: a A A

Research And Implement Of Chinese Text Categorization Algorithm Based On SVM

Posted on:2009-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:H Y XiongFull Text:PDF
GTID:2178360245955334Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The rapid growth of network information promotes automatic text categorization research and development. Among categorization algorithms, the support vector machine attracts many researchers' attention on account of its solid theoretical foundation and good categorization performance. By studying on basal theory of support vector machine, the dissertation in-depth analyses and discusses the text categorization method based on support vector machine, adjusts and improves the structure of traditional text categorization system based on support vector machine, implements text categorization system based on support vector machine in a more reasonable manner, and improves bintree multi-class text categorization algorithms based on support vector machine. Generally speaking, on the basis of researching theory of support vector machine, the dissertation has done some work as following:The traditional text categorization system based on support vector machine main includes training and testing modules. When implementing categorization system, the dissertation adjusts and improves its structure, makes text pretreatment be an independent module, and adds Chinese word segmentation, feature extraction, feature selection and text vector forming into it. It provides a commonly text input interface to trianing and testing, but the text pretreatment system outputs training or testing text vector by different instance. Thus, the training and testing modules do not have to relate to the text forming, it's conducive for the development and maintenance of system, which makes the system have better performance. In the process of training, the dissertation solves the quadratic programming problem by the method of Feasible Directions, and gives the algorithm for solving described.It's a hotspot research on support vector machine that extends it from two-class issues to multi-class. Among all kinds of motheds, bintree multi-class text categorization algorithm based on support vector machine is more effective then others in training and sorting, and it works out the impartibility problem. So it is a good mothed. The dissertation systematically researches and analyses bintree multi-class text categorization algorithm based on support vector machine, and then improves it on several aspects.That is, assembles firstly, and then sorts them when the size of testing texts is too large. The aim of the improvement is to make the testing text be computed more aimable, but does not always begin from the root node of bintree.It can enhance the effection of text categorization and make it be more accuracy when the size of testing texts is too large and the quantity of type function is too much.
Keywords/Search Tags:Text Categorization, Support Vector Machine, Statistical Learning Theory, Quadratic Programming
PDF Full Text Request
Related items