Font Size: a A A

Multi-instance Multi-label Web Pages Classification Based On Support Vector Machine

Posted on:2018-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:H B ZhuFull Text:PDF
GTID:2428330596968736Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet,the number of web page has also increased dramatically and web page classification technology has become a research focus in machine learning.Multi-instance Multi-label learning framework has a good power of expression and it is suitable for web page classification.MIML algorithm based on SVM is not able to use unlabeled instances at present.If the MIML algorithm uses the degradation strategy,it may loss the information between labels.This paper introduces the related technology of web page classification.This paper also explains the basic principle of SVM and discusses the MIML framework.In the real world problems,tagging samples often need to pay a heavy price while the unlabeled instances are more likely to get.Actually the E-MIMLSVM~+algorithm can't build the model using the unlabeled instances.This paper use a semi-supervised SVM improve it.The improved algorithm can make use of a small amount of label instances and a large number of unlabeled instances to study.The improved algorithm can help find the hidden structure information and understand the real distribution of sample set.It can improve the generalization performance of the classifier.A simple idea to tackle MIML is to identify its equivalence in the traditional supervised learning framework using the Multi-instance or the Multi-label as the bridge.But it may be lost the label connection information and affect the actual classification.MIMLSVM algorithm uses the degradation strategy and loss some information.This paper uses ML-LOC approach which allows label correlations to be exploited locally to improve the MIMLSVM algorithm.The improved algorithm which can use the local label correlations can able to achieve better accuracy.This paper designs a web page classification system which tests and evaluates the improved algorithm at last.The experimental results show that the improved algorithm improves the accuracy and generalization ability of the classifier.
Keywords/Search Tags:Web pages classification, Multi-instance Multi-label, SVM, Semi-supervised learning, Label correlations
PDF Full Text Request
Related items