Font Size: a A A

Research On Text Categorization Based On Support Vector Machine

Posted on:2006-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:C X CuiFull Text:PDF
GTID:2168360155957013Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the development at full speed of the technology of the computer and WWW, the electronic file information on Internet increases sharply. In the face of so vast information, people urgently need to look for a way that can obtain necessary information fleetly and accurately. And text categorization as the technological foundation is used in such fields as information filtering, information retrieval, search engine, text database, digitized library etc. There are extensive application prospects, so it becomes the hot problem.This paper study systematically text automatic categorization from three ways including vector model representation, feature selection and classifier training.(1) The whole process of text representation were discussed — word segmentation, building stop words list, feature selection, weight computation and generating vector space. Aim at the influence of stop words, a list, which is fit for text categorization, is set up and makes the vector dimension reduce.The existing methods of feature selection were introduced and compared and a kind of feature selection function which is suitable for SVM was constructed —feature selection function based on frequency in kind.(2) Three better methods of text categorization — Naive Bayes, KNN and SVM were introduced and compared at present: The experimental result indicates that SVM is a better method with relatively stabilization, high precision and better performance.(3) Combined advantages of Rough sets and SVM, a text categorization method based on Rough sets and SVM were proposed. This method can cut down vector dimension and reduce the training time of SVM by using Rough sets reduction.(4) One useful text categorization experimental system was carried...
Keywords/Search Tags:text categorization, feature selection, rough sets, SVM
PDF Full Text Request
Related items