Font Size: a A A

Multi-instance And Multi-label Web Page Classification Research Based On Support Vector Machine

Posted on:2020-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z K ZhangFull Text:PDF
GTID:2518306500987049Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the information era,the information on the Internet has increased exponentially.How to efficiently obtain the information that people need from many Internet information is an urgent problem to be solved.Web page classification technology in the field of machine learning can classify and summarize web pages to help people effectively extract and utilize massive amounts of information on the Internet.Among many web page classification algorithms,the multi-instance multi-label learning framework based on support vector machine has become a research hotspot in the field of machine learning because of its excellent learning ability.This paper introduces the process of web page classification and related technologies,expounds the basic principles of multi-instance multi-label learning framework and support vector machine,and analyzes the classification algorithm of support vector machine in multi-instance multi-label learning framework.The commonly used strategy in the multi-instance multi-label algorithm is to use multiple instance or multiple label as a bridge to transform the multi-instance multi-label problem into a traditional supervised learning problem using a degraded method.Aiming at the information loss caused by the MIMLSVM algorithm in the degradation process,this paper draws on the idea of GLOCAL algorithm and proposes the MIMLSVM-GLOCAL algorithm,which applies the label correlation to the multi-instance multi-label algorithm,which improves the classification accuracy.Aiming at the problem that multi-instance multi-label algorithm E-MIMLSVM+ can not use unlabeled sample modeling,the CSE-MIMLSVM+ algorithm is proposed by using the idea of semi-supervised support vector machine CS4 VM algorithm.The algorithm integrates the semi-supervised idea into the multi-instance multi-label algorithm,and makes full use of a large number of unlabeled samples to train the classification model,which improves the generalization performance of the algorithm.Finally,the Chinese web page classification system is designed and implemented,and the improved algorithm is applied to the system.The experimental results show that the proposed algorithm has obvious advantages in classification accuracy and generalization performance.
Keywords/Search Tags:Web pages classification, Multi-instance Multi-label, SVM, Label correlations, Semi-supervised
PDF Full Text Request
Related items