Research And Application Of Text Classification Algorithm Based On SVM

Posted on:2010-04-14

Degree:Master

Type:Thesis

Country:China

Candidate:P Wu

Full Text:PDF

GTID:2178360302960779

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

As the rapid development of information technology and information network, data mining from a large number of useful knowledge has become an important research area. SVM is a new learning method developed in recent years based on the foundations of statistical learning theory. It is gaining popularity due to many attractive features and promising empirical performance in the fields of nonlinear and high dimensional pattern recognition.Although statistical learning theory (SLT) has more solid theoretical foundation and rigorous theoretical analysis, there are still many problems to be fully studied and solved from theory to application. For example, current research in the field is trying to design some kind of classifier, which is expected to have superior optimal performance for all possible samples. But in many practical problems, it is not possible, and no need as well, to use such classifier to identify all samples, but often only some specific ones. This requires designing a more economical classifier, which has the ability to identify and classify specific unlabeled samples starting from labeled ones. Compared with traditional methods of inductive inference, it is so called transductive inference. TSVM (transductive support vector machine) takes into account a particular test set and tries to minimize misclassifications of just those particular examples. PTSVM (progressive transductive support vector machine) can automatically adapt to different data distributions and realize a transductive learning of support vectors in a more general sense. However, the process of pairwise labeling of PTSVM in the margin band is unnatural and products errors more easily. Although dynamical adjusting offers some sort of error recovery function, its ability is limited. In allusion to the shortcomings of PTSVM learning algorithm, ICPTSVM (an improved cache-based PTSVM) learning algorithm is presented. The algorithm uses pairwise labeling in the range and error-correcting on Cache to replace pairwise labeling in the margin band and dynamical adjusting. Then it not only greatly reduces the number of mis-labeling and improves the speed and accuracy, but also eliminate dead cycle of PTSVM learning algorithm. Through experimenting on the Wisconsin Breast cancer dataset of UCI and the Svmguide3 dataset of CWH03a. we have show that this algorithm is valid.In this paper, the improved Cache-based PTSVM learning algorithm is used in the full-text retrieval system of the general application platform of Dalian police. It significantly improves the accuracy of information retrieval and the work efficiency. At the same time, the system design and implementation in this paper is general, and so it has a certain guiding significance to implementation of retrieval system in different fields.

Keywords/Search Tags:

Statistical Learning Theory, Support Vector Machine, Transductive Inference, Cache, Pairwise Labeling In The Range

PDF Full Text Request

Related items

1	A Research On Transductive Support Vector Machine
2	Research On Some Problesm Of Support Vector Machine Learing Algorithm
3	Study Of Support Vector Machines Algorithm Based On Statistical Learning Theory
4	Studies Of Some Problems In Support Vector Machines And Semi-supervised Learning
5	A Kind Of Transductive Inference WEB Data Mine Based On Support Vector Machine
6	Research On Several Problems In Support Vector Machine And Support Vector Domain Description
7	Research Of Support Vector Machine Learning Algorithms
8	The Application Of Support Vector Machine In Industrial Inferential Measurements
9	Some Research On Support Vector Machine
10	Support Vector Machine Learning Under Noisy And Overlapping Data