Font Size: a A A

Imbalanced Web-page Classification Based On Multi-instance Multi-label Support Vector Machine

Posted on:2018-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:L TangFull Text:PDF
GTID:2428330596469804Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the popularity of Internet,the network has become the main way for people to obtain information.In order to help people get useful information from massive web pages,web page automatic classification technology emerges as the times require.In view of multi instance multi label(MIML)in the framework of the ambiguity of learning a unique advantage,and the support vector machine(SVM)excellent learning ability,the two fusion algorithm has become a research hotspot in the field of machine learning.However,the combination of the two is not enough to deal with the imbalance of the web page.Introduces the basic process of web page classification and its related technologies,describes the framework of MIML theory and algorithm,and discusses the principle of SVM development history,theory,and analyzes the development of MIMLSVM and MIMLSVM+ algorithm under the framework of MIML.Because the sample will be a sample of significantly more than other types of sample collection,according to the MIMLSVM in this sample are unbalanced classification problems,proposed to preprocess the sample by random sampling method improved,reducing the impact of unbalanced sample the network model.In real life,no labeled samples,but there are very few labeled samples,the distribution of a large number of unlabeled samples can provide the whole sample space,and then make up a small amount of labeled samples is difficult to describe the space distribution of samples,shortcomings,and can improve the performance of the classifier.In order to solve the problem of poor classification performance of MIMLSVM+ in this kind of unbalanced samples,a direct vector machine based on the two programming is proposed to deal with unlabeled samples,and the classification accuracy is improved.Finally,the improved training algorithm is applied to the web page classification system.The experimental results show that the proposed algorithm has higher classification efficiency and accuracy.
Keywords/Search Tags:Multi-instance Multi-label Learning, Support Vector Machine, Imbalanced Data, Web-pages Classification
PDF Full Text Request
Related items