Font Size: a A A

Web Pages Classification Based On Imbalanced Support Vector Machine Learning

Posted on:2014-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:R X ChenFull Text:PDF
GTID:2298330452962715Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
There are increasingly plentiful and various information on the Internet. Webpageclassification is an important approach to obtain useful information from the net. Throughcontinuous studying and validating, Support vector machine (SVM) is proved to be a highlyefficient classification method. Among the automatic classification algorithm, Support vectormachine has favorable learning ability. It has become a hot topic in the field of machinelearning. Because the classification accuracy is very bad when the data is unbalanced,Therefore, SVM learning based on imbalance data has important theoretical and practicalvalue.The paper introduces the theoretical of SVM, principle and techniques, analyses the meritand demerit, expounds support vector machine training algorithm and discusses imbalancedata training ways. In the imbalance SVM, the data set scale of the minority classes and themajority classes has a novel difference, we propose a new screening model to handle the gapof the two data sets by using oversampling and undersampling to realize that reduce the noisydata set and increase fewer data set. Therefore improve the accuracy of the imbalance SVM.In order to significantly improve the classification performance of Hyper-sphere supportvector machine, the c value is refined for each sample with data distribution information,adjust the scale of the sphere and the error rate, enhance the classification precise.Finally, the improved imbalanced SVM algorithms are applied to the Chinese web pageclassification system, we use the improved imbalanced SVM algorithm into the Chinese webpage classification system; and are compared and analyzed. The result indicates that theimproved algorithms have higher classification accuracy.
Keywords/Search Tags:support vector machine, imbalance data set, web pages classification, datapreprocessing, error rate
PDF Full Text Request
Related items