Font Size: a A A

Support Vector Machine Application In Text Categorization

Posted on:2007-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:H B ZouFull Text:PDF
GTID:2178360185995936Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text categorization as the technological foundation is used in such fields as information filtering,information retrieval,search engine,text database,digitized library etxc.There are extensive application prospects. And support vector machine is machine study technology of new generation on the basis of statistics learning theory, can deal with little sample problem concerning of study problem well .It utilizes kernel function thought to turn non-linear problem into linear problem, to reduce the complexity of the algorithm. At present, support vector machine becomes the new research focus of the field of machine study.This subject researches support vector machine application in text categorization from three respects of selecting text feature, support vector machine increment algorithm, multi-class text categorization. In text categorization, tens of thousands of dimension of feature is very general phenomenon .To make categorization algorithm effective, it must use feature selecting method to reduce the dimension of feature space. In this paper commonly used text feature method has been analyzed and compared, and have proposed text feature method on the basis of support vector machine, the experiment proves this method is feasible.In the article analyses deeply the characteristic of support vector machine, have recommended general increment studying algorithms. Through analyses, it points out that it is more difficult to confirm the parameter of studying in increment study, so this text utilizes v-SVM method to propose a kind of increment learning method of support vector machine. It can adjust automatically the increment training parameter, and provides this method primitive optimization question, Lagrangian function and wolf question .Traditional text categorization tool, to collecting positive and negative training examples, requires laborious preprocessing. Because of collecting negative examples is very difficult,in order to cancel the need of collecting negative training examples, and effectively apply SVM to multi-class text categorization , this article introduces and analyses three kinds of commonly used multi-class text categorization methods, proposes a multi-class categorization method only with positive examples based on SVM. The purpose of this categorization method is to study from positively dataset without the labels, and then accomplish multi-class classification, gets categorization precision as so as dataset with the labels.
Keywords/Search Tags:support vector machine, text categorization, increment algorithm, multi-class categorization
PDF Full Text Request
Related items