Font Size: a A A

Study On Text Categorization Method Based On Support Vector Machine

Posted on:2007-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:W YingFull Text:PDF
GTID:2178360212480626Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Data mining is a new technology that is used to extract useful information and knowledge from large databases. Text classification is an important task of data mining. Facing the massive volume and high dimensional data how to build effective algorithm for text mining is one of research directions of data mining. Aiming at above issues, some problems of text classification with SVM (support vector machine) have been studied substantially in this paper. The main contents are listed as follows:Through analyzing the main reason that the training speed of SVM is slow, we employ a pre-extracting SVs(support vectors) algorithm and circulated iterative algorithm to improve the speed of training SVM. And based on it a new two classes text categorization algorithm is presented which includes pre-extracting support vectors as the initial working set and fuzzy circulated iterative algorithm as training method of SVM. Compared with the conventional support vector machines, the present method possesses much higher computation efficiency.To solve the problems and defections of existing methods of SVM multiclass classification, a new method of SVM multiclass classification based on binary tree is employed and applied it to multiclass text categorization. Several simulations demonstrate that compared with the existing methods,the new method prseessed the following advantages: the number of SVMs needed to be trained is less, the speed of training and decision is fast and the region that can not be classified does not exist again.
Keywords/Search Tags:Text mining, support vector machines (SVM), two classes text categorization, multiclass text categorization
PDF Full Text Request
Related items