Font Size: a A A

Research On Cost-sensitive Multi-label Classification Algorithms And Applications To Tag Recommendations

Posted on:2016-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:C B ShenFull Text:PDF
GTID:2298330467972718Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The rapid development of information technology has brought the problem of information overload, which results that it’s difficult for people to get needful information from massive data. Automatic classification technology is one of effective means to solve the problem and it has been widely used in many areas. Traditional classification usually assumes that instances are associated with only one label. However, in some areas instances are usually associated with multiple labels such as images, text, bioinformatics and so on. Traditional classification algorithms are not useful in these areas. Therefore, the study of multi-label learning becomes an important research topic. In addition, traditional classification assumes that all classification errors have the same costs. While in real-world applications, different errors often have quite different costs. Therefore, cost-sensitive learning plays important role in such applications. Social tag recommendation is recent research hot topic. As its tag correlation and noisy information, Multi-label classification and cost-sensitive classification can be applied to it. This paper focuses on multi-label classification and cost-sensitive classification. Based on the characteristics of social tags, this paper also research on applying multi-label classification algorithms and cost-sensitive classification algorithms and their combination to social tag recommendation.Firstly, the concepts of multi-label classification and cost-sensitive classification are described in this paper. We summarize relevant algorithms about multi-label classification and cost-sensitive classification separately and analyze their advantages and disadvantages. Secondly, we propose a multi-label classification algorithm based on label clustering. This algorithm identifies important unseen multi labels by label’s balanced k-mean clustering, then combines with the original training set to form a new training set, and trains a classifier with new training set to improve existing LP methods. Experimental results based on multi-label datasets show that this algorithm could find important unseen multi labels and improve classification performance. Finally, based on two characteristics of social tags, we can research social tag recommendation by modeling it as a multi-label problem and a cost-sensitive problem separately. Then Based above research, we combine cost-sensitive classification with proposed multi-label classification algorithm for social tag recommendation. Compared to cost-sensitive classification or multi-label classification, Experimental results based on social tagging datasets show the combined algorithm have more a superior performance with respect to regular evaluation metrics and cost-sensitive evaluation metrics.
Keywords/Search Tags:Data Mining, Multi-label, Cost-sensitive, Classification, Social TagRecommendations
PDF Full Text Request
Related items