Font Size: a A A

Research On Multi-label Learning And Algorithms Based On Data And Label Correlations

Posted on:2015-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:J Z QiuFull Text:PDF
GTID:2268330425996306Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Multi-label learning comes from the ambiguity in the document classification, instancesmay have a number of different categories due to the existence of ambiguity. Multi-label learningproblem exists widely in the real practical problems. Multi-label learning has gradually become anew research hotspot in the international machine learning community, and it has been widelyapplied in many fields. With the further research on multi-label learning, some new researchshave been spawned, such as multi-instance and multi-label ranking problem.So the research onmulti-label learning problem has an important positive significance. Firstly, this paper introducesthe back ground and significance of multi-label learning, as well as the present researches andthe problems currently facing. Secondly, the paper introduces the framework, evaluation me-thods of the multi-lable learning. Finally, we introduce several typical multi-label learning algo-rithms in detail. In this paper, we do the research on multi-label learning mainly from two as-pects: data correlation and label correlation. In the end, we will propose the corresponding algo-rithms to solve the problems encountered in multi-label learning on the basis of the above inves-tigations and through a large number of experimental comparisons to verify the effectiveness ofthem.This paper is to carry out and completed the following tasks on multi-lable learning:(1)Based on the research on data correlation, a modified algorithm with label-specific fea-tures for multi-label learning is proposed. As we all know, the instances have a certain distribu-tion characteristics, and the data with the same label always gather together. An instance’s labelinformation may provide some useful information for the other instances, especially in the caseof relatively scarce data. Using the relationship between the labeled and unlabeled instances canavoid the errors caused by insufficient data.The LIFT is an algorithm with label-specific featuresfor multi-label learning. In the algorithm, the feature sets are generated by using the method ofequal weight for each instance, and the algorithm ignores the relevance among instances. Thegreater correlation they have, the higher likelihood of the same label they have. Based on the re-search on data correlation, we propose the W-LIFT algorithm to solve the multi-label learningproblem. The algorithm considers the correlation between instances and results in exact featuresets by weighting instances. Experimental results show that the modified algorithm works betterthan the other commonly used multi-label algorithms.(2)Based on the research on label correlation, the locally ordinal classifier chain algorithm for multi-label learning is proposed. In the multi-label learning research, one label may providesome useful information for the other labels related, especially for those containing a smallamount of training instances. Considering the correlation between labels can solve the problemscaused by insufficient data. The correlation among different labels plays an important role inclassification problems, and recent investigations have taken into account the label correlationduring the multi-label learning. The label information is marked into the attribute space throughthe classifier chains and provides useful information for the other labels during the classificationprocess. The classification results are indeterminate and instable because of the random classifierorder in the classifier chain. Besides, it may cause to propagate the error label information.Based on the research on label correlation and classifier chains, we propose the LOCC algorithm,which fully considers the local distribution of instance labels.The algorithm gives the related la-bels of an instance from the perspective of probability and sort the classifiers according to thesize of the probabilities. Experimental results show that the LOCC algorithm works better thanthe other commonly used multi-label algorithms.
Keywords/Search Tags:Multi-label learning, label ranking, weighing, k-nearest neighbor, label correlation, classifier chains
PDF Full Text Request
Related items