Font Size: a A A

Contributions To Several Issues Of Multi-Label Learning

Posted on:2012-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:J HuangFull Text:PDF
GTID:2178330335990379Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Multi label learning is one the hotspots of machine learning and data mining recently, it is widely used in text categorization, webpage classification, semantic scene classification and classification of gene functions in Bioinformatics. Researching on multi-label learning has its practical significance and application value. Researchers have proposed many efficient solutions and methods, but there are still many problems worthy of study.Data set consists of samples which represent by one instance and each instance has only one label. There are samples which represent by the same instance but with different corresponding labels. Such classification problem also belongs to SIML learning. These samples would be predicted to one of the labels which they belongs to, by traditional classifier.The classify results for some of these samples will be thinked uncorrect, when traditional accuracy evaluation method was used. Infact, the label that the classifier predicted is one of the label set which the multi-label instance belongs to simultaneously, the result of classification is correct, and it is not considerate by current accuracy evaluation methods. In real applications, when the number of samples for each label which a multi-label instance belong to, is imbalanced. Different results predicted by the classifier reflect different performance of the classifier, while current accuracy evaluation criterion can't distinguish the performance of different classifiers effectively; three aspects about this problem were researched in this paper.Some of the existing multi-label learning algorithms, a real valued function was learned first, it reflects the degree of a sample belongs to a kind of category, and then a minimum threshold was set to decide whether the sample belongs to the category. If the threshold was set too high, the labels will not be predicted completely, if it was set too low, extra labels will be predicted by the classifier, and the distributions of each category are different, so it's improper to setting a unified minimum threshold for all categories. In this paper, minimum threshold will be set for each label, according to different distributions for each label. One label will be predicted only if the value of real valued function is bigger than the threshold which was set for this label.
Keywords/Search Tags:Machine Learning, Data Mining, Multi Label Learning, Single Label Learning, Classification, Accuracy Evaluation, Threshold Determination, Class Imbalance
PDF Full Text Request
Related items