Font Size: a A A

On Multi-Label Classification Algorithms Based On Label-Specific Features And Mutual Neighbor

Posted on:2013-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:H Q QuFull Text:PDF
GTID:2248330374993066Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the continuous advent and rapid development of Internet, information retrieval and other new technologies, data has been accumulated greatly in real applications, which usually hides some information and knowledge that reflect back the change laws of things. To make use of these data generated in a variety of application effectively, useful information and knowledge is necessary to be mined from these data generated and to be widely applied in areas such as business management, production control, market analysis, engineering design and science exploration and so on. It is a sharp increase of these requirements that drive development of data mining forward.Classification, as a data analysis method, is one of the most active topics in data mining research, which can be used to extract model describing important data class or predicting change trend in future. According to the number of labels which samples have, classification mainly includes single-label classification and multi-label classification. Multi-label classification is frequently used in real application, and more attentions have been paid to multi-label classification because of the theoretical significance and the application value of multi-label classification.With the constantly deepening of relevant research, the importance and values of multi-label classification have been gradually come out. At present, different types of multi-label classification algorithms have been presented to solve various practical problems by combining with a variety of learning techniques. Nevertheless, the understanding of the relations between categories and attributes and the factors that affect the efficiency of the algorithm are still not deep enough. For example, it is not taken into account in most multi-label classification algorithms during dealing with multi-label data that the contribution of the different set of features for each class label are different; although the idea of k nearest neighbors is widely used in multi-label classification, there are still several issues to be further settled, such as the classification performance is disturbed easily by noise data, and the optimal value of k is difficult to be determined. The solution to these issues has important significance of research and practical reference value not only for theoretical research of data classification but also for multi-label classification application.Multi-label classification is researched, and the main research content and contributions in the thesis are as follows:·Aiming at the interrelation between the class labels and features and their contribution to the performance of multi-label classification, a multi-label classification algorithm based on label-specific features is presented. A feature density on the positive and negative instances set of each class is computed first, and then mk features of highest densities are chosen from the positive and negative instances set of each class, respectively; the intersection is taken as the label-specific features of the corresponding class. Finally, multi-label data is classified on the basis of label-specific features.·Inspired by the idea of k nearest neighbor, a multi-label classification algorithm based on k mutual neighbors is presented. The significance of neighbor is discussed by using the concept of mutual neighbor to distinguish the true and false neighbor. Then, the real and reliable neighbor information is obtained, and the anomalies are eliminated based on mutual neighbor. Finally, a label set of unlabeled sample is to be predicted based on reliable neighbor information. At the same time, noise data in original data set is eliminated by calculating the mutual neighbors of each sample, as a consequence, the quality of the data set is improved, all that is done is more advantageous to train a stronger classifier.·At the algorithm simulation experiment part, the experiment is made on several benchmark data sets, and our algorithm is compared with several state-of-the-art multi-label classification algorithms to justify the efficiency. The experiment result of the multi-label algorithm based on label-specific features indicates that not only the performance of our proposed algorithm is better than that of other algorithms, but also it is clear to learn to which class label the known features belong. The experiment result of the multi-label algorithm based on k mutual neighbor shows that not only the performance of our proposed algorithm is better than that of other algorithms, but also the concept of mutual neighbor can be actually used to discern noisy data from initial data sets and to improve the quality of the data set.
Keywords/Search Tags:multi-label classification, label-specific features, mutual neighbors, noise data
PDF Full Text Request
Related items