Font Size: a A A

Seed Node Selection And Link Prediction Based Multi-Label Classification

Posted on:2017-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ZhaoFull Text:PDF
GTID:2308330485462220Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the development of Internet and information technologies, network data mining plays a more and more important role, and has been successfully applied to the analysis of user behavior, document classification, image classification, and so on. The classification problem can be further distinguished into single-label classification and multi-label classification. When each object to be classified is associated with only one single label from a set of candidate class categories, the problem is referred to as single-label classification; when the object is associated with multiple labels simultaneously, the problem is called multi-label classification. Collective classification methods exploit the homogeneity between objects, simultaneously classify related instances in the network, and can get better classification performance, mainly applicable to the single-label classification problem in homogenous information networks. However, in the real world, people often encounter multi-relational networks, where the entities are usually associated with multiple labels, and the connections between entities are diverse and often represent different semantic meanings. Most collective inference models do not differentiate in their treatment of connections between entities and are difficult to provide higher classification accuracy. The multi-label classification problem in multi-relational networks has been widely concerned and studied by many scholars.In existing multi-label classification algorithms, the selected nodes are usually selected randomly from the network as the training set, the classification results are not stable, and the classification accuracy is not high. Based on active learning, we put forward an SHDA algorithm. By exploiting the topology of a network, we partition the network into different affiliations, select nodes of high degrees from each affiliation. The selected nodes are then merged, and after processing, we obtain the seed nodes. Labeling the seed nodes and taking them as the training set to classify multi-labeled data, the SHDA algorithm maximizes performance by using a minimum node set, and improve the classification accuracy.In real life, network data may be incomplete and incomprehensive, and cannot truly reflect the relationships between entities in the real world. Focusing on this situation, we propose an LP-SCRN algorithm. It uses an even step link prediction algorithm to predict missing links in the network, and then calculates the weights of these links according to the similarity between nodes’ social features. A node’s label set is estimated based on its neighbors’class labels, its class propagation probability and the normalized weight between a node and its neighbors. The Relaxation Labeling approach is adopted to update the prediction probability iteratively. The LP-SCRN algorithm combines link prediction with a multi-label relational neighbor classifier, and experiments on several real datasets show that the proposed algorithm can improve multi-label classification performance in multi-relational networks.
Keywords/Search Tags:multi-label classification, multi-relational network, collective classification, link prediction, seed nodes
PDF Full Text Request
Related items