Font Size: a A A

Study On Text Classification Based On Multi-class Support Vector Machines

Posted on:2008-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:S D DuFull Text:PDF
GTID:2178360215990926Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid growth of the information, the task of mining natural language documents and classifying them into a predefined set of semantic categories has become one of the key methods for organizing text information. This task is commonly referred to as a key mission of text mining-text classification. Support vector machine is a new machine learning technique developed from the middle of 1990s by Vapnik. It's a new tool for machine learning by using optimization method. And it's characterized by the use of a maximal margin hyper-plane, the theory of kernels, and the absence of local minima, convex optimization the sparseness of the solution, Mercer's theorem and the capacity control obtained by acting on the margin. A large number of experiments and application for text classification and pattern recognition have shown that support vector machines has not only simpler structure, but also better performance, especially its better generalization ability. But the support vector machines approach was originally developed to solve binary classification problems. How to extend it for multi-class problems and apply it to text classification is a key research point in this paper.In this paper, text mining problem is introduced first, and then popular multi-class support vector machines algorithms are researched. Binary tree architecture support vector machines (BT-SVM) is an important type of multi-class support vector machines and after some of traditional algorithms of multi-class support vector machines are analyzed, we propound improving strategies to algorithms of BT-SVM and apply it to text classification as the key task of text mining. The main work is follows:①An overview on a variety of algorithms and techniques for text mining is given. We have carried on in-depth analysis to the many kinds of text classification algorithms that existed at present and to the principle of SVM. How to apply it to classification mining is also a research issue.②Support vector machines for multi-class problems is discussed. Several methods have been proposed including "one-against all", "one-against-one", DAGSVM, Classification method of multi-class SVM based on binary tree, and so on. And their pluses and minuses and performances are compared.③Furthermore, BT-SVM for multi-class classification are discussed, Several tree architecture strategies of BT-SVM have been proposed, and their training time, sample capacity and decision plan are compared, and after that we proposed an improving strategies of BT-SVM algorithm based on binary tree. We show the algorithm in detail and analyze its characteristics by experiments.④Text classifier based on the improving strategies of BT-SVM is researched, In order to solve the problem of traditional text classifier based on support vector machines, we proposed the new multi-class text classifier based on the improved BT-SVM and applied it to multi-class text classification as a experiment.
Keywords/Search Tags:Support Vector Machines, Feature Selection, Text Mining, Binary Tree Multi-class SVM, Text Classifier
PDF Full Text Request
Related items