Font Size: a A A

Web-page Classification Method Based On Multi-instance Multi-label

Posted on:2017-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y X WangFull Text:PDF
GTID:2348330566957316Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of informatization,the information in the Internet growing exponentially.How to extract the needed information from the network quickly and effectively has become an urgent issue.To improve the efficiency of information extraction from massive web-pages,people utilize web-page classification technology to summarize the web-pages.Web-page classification technology can narrow the scope of the search target and race against the precious time.In view of peculiar advantage of Multi-instance Multi-label(MIML)framework in the learning of ambiguity and excellent learning ability of support vector machine(SVM),the fusion algorithm has become a hotspots of Machine Learning.This paper briefly introduces the basic processes and its related technology of web-page classification,expatiates the theory and algorithms of MIML and SVM,as well as the combination of them.Algorithms usually degenerate MIML problems to SIML or MISL,but information always lose in the process of information degradation.Therefore,to reduce the information loss in the process of degradation,this article make use of the Ensemble Label-dependencies of Classifier Trees(ELDCT)algorithm,combined with MIMLSVM~+algorithm,to improve the classification accuracy.So,the dependencies between labels are incorporated in the training process of classifier.In reality,labeled sample often means expensive annotation,smaller number,what's worse,it cannot fully reflect the actual distribution of the sample.However,unlabeled sample is contrast.Although labeled sample has many advantages,but it can't be reasonably used.The classifier which trained by a small amount of labeled sample has poor behavior in the classification process.Therefore,in order to use unlabeled samples to estimate the sample distribution more accurately,this paper integrates Transductive Learning into MIMLSVM~+algorithm.On the one hand,using the method of Support Vector Domain Description(SVDD)instead of labeling in pairs.On the other hand,introducing the Incremental Learning.These strategies can not only accelerate the convergence speed of algorithm,but also enhance the generalization ability,to make the classifier performance further improved.In the end,to verify the application effect of the algorithm,this paper designed a Web-page classification system based on the improved algorithms and evaluate experiment.The results show that the improved algorithms have a better performance in classification.
Keywords/Search Tags:Web-pages Classification, Multi-instance Multi-label Learning, Support Vector Machine, Label Dependencies, Transductive Learning
PDF Full Text Request
Related items