Web Pages Classification Technology Based On Multi-instance Multi-label Support Vector Machine

Posted on:2019-03-24

Degree:Master

Type:Thesis

Country:China

Candidate:H You

Full Text:PDF

GTID:2348330542973600

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet,the information resources on the Internet have exploded.People can not quickly and accurately obtain the information they need from the information deficienter to the information-overload.Therefore,information sorting,classification and other technologies have been developed,web page automatic classification technology came into being.It helps people get the information they need quickly and efficiently in the harsh ocean of information.Support Vector Machines under multi-instance multi-label learning framework have become the hotspot of web page classification algorithms due to their excellent machine learning abilities.Most web page classification algorithms are learning algorithms under the single-sample single-label framework,but the actual page classification is a multi-instance multi-label problem because the document content of a web page generally contains multiple parts;and a web page may have both sports,stars,Entertainment,tourism,economy and many other theme categories mark.This paper first introduces the basic process of web page classification and related key technologies,analyzes the principle of SVM under multi-instance multi-label learning framework,then elaborates the degenerate strategy of multi-instance multi-label learning framework,studies the latest Support Vector Machine Algorithm in Marked E-MIMLSVM+Algorithm.Aiming at the problem of information loss in the multi-instance multi-label support vector machine,that is,the contact information between the label and the sample and the contact information between the tags,the multi-instance multi-label support vector machine algorithm is improved.A semi-supervised SVM,A semi-supervised SVM algorithm is proposed to solve the problem of performance degradation when using unlabeled data.In order to solve the problem of small sample in traditional supervised learning,this paper proposes a classification algorithm based on multi-instance multi-label framework,which is a semi-supervised learning method.It can use a large number of unlabeled samples to train the classification model.The participation of a large number of unlabeled samples allows the classification model trained to more accurately reflect the distribution of sample data,the classification model can be more suitable for classification output in the face of new samples,effectively improve the performance of the classification algorithm.Finally,the Chinese webpage classification system is designed in engineering practice,andthe webpage samples collected on the network are tested.The results show that the improved classification algorithm has more excellent performance.

Keywords/Search Tags:

Multi-instance Multi-label learning, Web pages classification, Support Vector Machine, S3VM

PDF Full Text Request

Related items

1	Web Pages Classification Based On Multi-instance Multi-label Support Vector Machine
2	Multi-instance Multi-label Web Pages Classification Based On Support Vector Machine
3	Web-page Classification Method Based On Multi-instance Multi-label
4	Research On Key Technologies For Multi-instance Multi-label Web Page Categorization
5	Multi-instance And Multi-label Web Page Classification Research Based On Support Vector Machine
6	Imbalanced Web-page Classification Based On Multi-instance Multi-label Support Vector Machine
7	Research On Web Text Mining Based For Multi-instance Multi-label Classification
8	Research Of Image Classification Based On Multi-instance Learning
9	Research And Application On Multi-Instance Learning Using Support Vector Machine
10	Research On Metric Learning Based Support Vector Machine Algorithm And Its Applications