Web-page Classification Method Based On Multi-instance Multi-label

Posted on:2017-07-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Wang

Full Text:PDF

GTID:2348330566957316

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of informatization,the information in the Internet growing exponentially.How to extract the needed information from the network quickly and effectively has become an urgent issue.To improve the efficiency of information extraction from massive web-pages,people utilize web-page classification technology to summarize the web-pages.Web-page classification technology can narrow the scope of the search target and race against the precious time.In view of peculiar advantage of Multi-instance Multi-label(MIML)framework in the learning of ambiguity and excellent learning ability of support vector machine(SVM),the fusion algorithm has become a hotspots of Machine Learning.This paper briefly introduces the basic processes and its related technology of web-page classification,expatiates the theory and algorithms of MIML and SVM,as well as the combination of them.Algorithms usually degenerate MIML problems to SIML or MISL,but information always lose in the process of information degradation.Therefore,to reduce the information loss in the process of degradation,this article make use of the Ensemble Label-dependencies of Classifier Trees(ELDCT)algorithm,combined with MIMLSVM~+algorithm,to improve the classification accuracy.So,the dependencies between labels are incorporated in the training process of classifier.In reality,labeled sample often means expensive annotation,smaller number,what’s worse,it cannot fully reflect the actual distribution of the sample.However,unlabeled sample is contrast.Although labeled sample has many advantages,but it can’t be reasonably used.The classifier which trained by a small amount of labeled sample has poor behavior in the classification process.Therefore,in order to use unlabeled samples to estimate the sample distribution more accurately,this paper integrates Transductive Learning into MIMLSVM~+algorithm.On the one hand,using the method of Support Vector Domain Description(SVDD)instead of labeling in pairs.On the other hand,introducing the Incremental Learning.These strategies can not only accelerate the convergence speed of algorithm,but also enhance the generalization ability,to make the classifier performance further improved.In the end,to verify the application effect of the algorithm,this paper designed a Web-page classification system based on the improved algorithms and evaluate experiment.The results show that the improved algorithms have a better performance in classification.

Keywords/Search Tags:

Web-pages Classification, Multi-instance Multi-label Learning, Support Vector Machine, Label Dependencies, Transductive Learning

PDF Full Text Request

Related items

1	Multi-instance Multi-label Web Pages Classification Based On Support Vector Machine
2	Web Pages Classification Based On Multi-instance Multi-label Support Vector Machine
3	Research On Key Technologies For Multi-instance Multi-label Web Page Categorization
4	Web Pages Classification Technology Based On Multi-instance Multi-label Support Vector Machine
5	Multi-instance And Multi-label Web Page Classification Research Based On Support Vector Machine
6	Imbalanced Web-page Classification Based On Multi-instance Multi-label Support Vector Machine
7	Research On Web Text Mining Based For Multi-instance Multi-label Classification
8	Research Of Image Classification Based On Multi-instance Learning
9	Research On Image Classification Algorithms Based On The Transductive Multi-instance Learnig
10	Research On Multi-label Data Classification Technology