Font Size: a A A

Web Pages Classification Based On Multi-instance Multi-label Support Vector Machine

Posted on:2015-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhangFull Text:PDF
GTID:2308330503475090Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the popular ity of the Internet, the amount of information networks exploding, which brings great challenges for people to obtain useful information from the internet. Therefore Webpage classification technology emerge as the times require, which can help people organize and ut ilize the mass information on internet effectively. Among the many Webpage classification algor ithm, support vector machine under the multi- instance multi- label framework focus in machine learning because of its excellent classification ability.This paper introduces the general process and the key technique of Webpage classification and describes the development, the basic principles and the commonly used training algor ithm of support vecter machine. The common training methods of multi- instance multi- label learning are also studied, including the E-MIMLS VM+ algor ithm under this framework. In order to fully use the link between tags and examples and the link between tags, we proposed an improved E-MIMLS VM+ algorithm based on the ensembles of classifier chains, which improves the classification efficiency and further improves the classification accuracy rate. Ensemble of classifier chains is capable of using information contact between labels. Emsemble of classifier chains has low time complexity and space complexity by using a random subset selection strategy. To solve the small sample problem existing in the traditional supervised learning, we proposed the transductive-SVM algorithm under the multi- instance multi- label framework,which is an semi-supervised learning method. This method can train classifier by using both labeled and unlabeled samples and the same time and improves the accuracy of classification algorithm.Finally, the improved training algorithms are applied to the web pages classification system. The performance of improved algorithms are analyzed and compared. Experimental data show that the algor ithms have higher efficiency and accuracy.
Keywords/Search Tags:Multi-instance Multi-label, Web pages classification, Support Vector Machine, Ensembles of C lassifier C hains, Semi-supervised learning
PDF Full Text Request
Related items