Font Size: a A A

Research And Application Of Text Classification Algorithm Based On SVM

Posted on:2010-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:P WuFull Text:PDF
GTID:2178360302960779Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the rapid development of information technology and information network, data mining from a large number of useful knowledge has become an important research area. SVM is a new learning method developed in recent years based on the foundations of statistical learning theory. It is gaining popularity due to many attractive features and promising empirical performance in the fields of nonlinear and high dimensional pattern recognition.Although statistical learning theory (SLT) has more solid theoretical foundation and rigorous theoretical analysis, there are still many problems to be fully studied and solved from theory to application. For example, current research in the field is trying to design some kind of classifier, which is expected to have superior optimal performance for all possible samples. But in many practical problems, it is not possible, and no need as well, to use such classifier to identify all samples, but often only some specific ones. This requires designing a more economical classifier, which has the ability to identify and classify specific unlabeled samples starting from labeled ones. Compared with traditional methods of inductive inference, it is so called transductive inference. TSVM (transductive support vector machine) takes into account a particular test set and tries to minimize misclassifications of just those particular examples. PTSVM (progressive transductive support vector machine) can automatically adapt to different data distributions and realize a transductive learning of support vectors in a more general sense. However, the process of pairwise labeling of PTSVM in the margin band is unnatural and products errors more easily. Although dynamical adjusting offers some sort of error recovery function, its ability is limited. In allusion to the shortcomings of PTSVM learning algorithm, ICPTSVM (an improved cache-based PTSVM) learning algorithm is presented. The algorithm uses pairwise labeling in the range and error-correcting on Cache to replace pairwise labeling in the margin band and dynamical adjusting. Then it not only greatly reduces the number of mis-labeling and improves the speed and accuracy, but also eliminate dead cycle of PTSVM learning algorithm. Through experimenting on the Wisconsin Breast cancer dataset of UCI and the Svmguide3 dataset of CWH03a. we have show that this algorithm is valid.In this paper, the improved Cache-based PTSVM learning algorithm is used in the full-text retrieval system of the general application platform of Dalian police. It significantly improves the accuracy of information retrieval and the work efficiency. At the same time, the system design and implementation in this paper is general, and so it has a certain guiding significance to implementation of retrieval system in different fields.
Keywords/Search Tags:Statistical Learning Theory, Support Vector Machine, Transductive Inference, Cache, Pairwise Labeling In The Range
PDF Full Text Request
Related items