Font Size: a A A

Research On Key Technologies For Multi-instance Multi-label Web Page Categorization

Posted on:2019-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:C C TianFull Text:PDF
GTID:2428330626956579Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of information technology,the information in the Internet grows exponentially.How to extract the needed information from the network quickly and effectively has become an urgent issue.Therefore web page classification technology emerges as the time requires.This technology can induce and collate the web pages,and help people organize and utilize the massive information effectively.Among many webpage classification algorithms,support vector machine under multi-instance and multi-label framework has become a research hotspot in machine learning field because of its excellent learning ability.This paper first introduces the basic process,related technology and common algorithms of web page classification,expounds the basic principle of support vector machine,describes the basic theory of multi-instance multi-label framework,and analyzes the classification algorithm of support vector machine under multi-instance and multi-label framework.In order to solve the problem of multi-instance multi-label algorithm not able to use the dependencies between labels,this paper proposes the OCC-MIMLSVM~+algorithm based on the idea of ordered classifier chain.The algorithm organizes the classifier and integrates the dependency relation between labels into the training process of the algorithm,so that the trained classification model can make use of the dependency relationship between labels,and improves the accuracy of classification.In order to solve the problem of multi-instance multi-label algorithm not able to use unlabeled samples,this paper proposes S4VM-MIMLSVM~+algorithm based on the semi-supervised S4VM algorithm.The algorithm integrates the idea of S4VM algorithm into the multi-instance multi-label algorithm,taking advantage of a large number of unlabeled samples in the training process of the classifier.At the same time,multiple low density classifiers is considered,which reduces the problem of performance decline in semi-supervised learning and further improves the generalization ability of the algorithm.Finally,the improved training algorithms are applied to the web pages classification system.The experimental data show that the algorithms have higher efficiency and accuracy.
Keywords/Search Tags:Multi-instance Multi-label Learning, Web-pages Classification, Support Vector Machine, Classifier Chain, Semi-supervised Learning
PDF Full Text Request
Related items