Font Size: a A A

Web Pages Classification Technology Based On Multi-instance Multi-label Support Vector Machine

Posted on:2019-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:H YouFull Text:PDF
GTID:2348330542973600Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the information resources on the Internet have exploded.People can not quickly and accurately obtain the information they need from the information deficienter to the information-overload.Therefore,information sorting,classification and other technologies have been developed,web page automatic classification technology came into being.It helps people get the information they need quickly and efficiently in the harsh ocean of information.Support Vector Machines under multi-instance multi-label learning framework have become the hotspot of web page classification algorithms due to their excellent machine learning abilities.Most web page classification algorithms are learning algorithms under the single-sample single-label framework,but the actual page classification is a multi-instance multi-label problem because the document content of a web page generally contains multiple parts;and a web page may have both sports,stars,Entertainment,tourism,economy and many other theme categories mark.This paper first introduces the basic process of web page classification and related key technologies,analyzes the principle of SVM under multi-instance multi-label learning framework,then elaborates the degenerate strategy of multi-instance multi-label learning framework,studies the latest Support Vector Machine Algorithm in Marked E-MIMLSVM+Algorithm.Aiming at the problem of information loss in the multi-instance multi-label support vector machine,that is,the contact information between the label and the sample and the contact information between the tags,the multi-instance multi-label support vector machine algorithm is improved.A semi-supervised SVM,A semi-supervised SVM algorithm is proposed to solve the problem of performance degradation when using unlabeled data.In order to solve the problem of small sample in traditional supervised learning,this paper proposes a classification algorithm based on multi-instance multi-label framework,which is a semi-supervised learning method.It can use a large number of unlabeled samples to train the classification model.The participation of a large number of unlabeled samples allows the classification model trained to more accurately reflect the distribution of sample data,the classification model can be more suitable for classification output in the face of new samples,effectively improve the performance of the classification algorithm.Finally,the Chinese webpage classification system is designed in engineering practice,andthe webpage samples collected on the network are tested.The results show that the improved classification algorithm has more excellent performance.
Keywords/Search Tags:Multi-instance Multi-label learning, Web pages classification, Support Vector Machine, S3VM
PDF Full Text Request
Related items