Font Size: a A A

Clustering Algorithm Research Based On Semi-supervised GN Algorithm

Posted on:2017-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z YuFull Text:PDF
GTID:2308330482980516Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the era of "Internet plus", the explosive growth of network data leads to the phenomenon processing and analyzing the data in the network has become a thorny issue. There is an effective way called community detection which process and analyze data in a way that the data in the network can be divided into several communities quickly according to a certain attribute. Current community discovery methods have to specify prior community number, to determine in advance after partition size of each community or other on the scope of community information as well as pre specified evaluation index and other defects, and most of the community found algorithm is unsupervised category, can not have effective use of known a priori knowledge. The classical GN(Givern-Newman) algorithm is a kind of split hierarchical clustering algorithm based on heuristic algorithm, which has high accuracy and can be applied to small and medium network. But GN algorithm can not use the known prior knowledge and its time complexity is relatively high. In this paper, aiming at the problem,we improving GN algorithm, The main research achievements are as follows:(1) GN clustering algorithm based on semi-supervisedIn order to solve the problem that the classical GN algorithm can not take advantage of the known prior information, this paper proposes a semi-supervised GN clustering algorithm, denoted as S-GN(based on semi-supervised Givern-Newman). S-GN algorithm uses based on the constrained semi-supervised learning method, learning the known must-link constraints, cannot-link constraints information, to learn knowledge through based on distance measurement method is verified to improve the quantity and quality of the prior information. The constraints of the known constraints on the integration of the network, delete the Cannot-link constraint information from the network. Through the iterative calculation of boundary value of the betweenness and deleting the full-betweenness of the maximum value to the network edge, to divide the state into the best community. In artificial and real networks analysis S-GN algorithm, the time complexity of the S-GN algorithm needs to be improved, but it can improve the accuracy and efficiency of the algorithm.(2) semi-supervised GN clustering algorithm based on similarityIn order to solve the problem of high time complexity, an improved GN algorithm based on the similarity of cluster analysis is proposed, which is denoted as S-SGN(based on similarity and semi-supervised Givern-Newman). By using the known and learned prior information, the relationship between the node pair and the corresponding value in the initial matrix of the network is modified according to the different similarity construction method. By iterative calculation of the value of the node similarity, delete the edge of the smallest similarity value to divide the state into the best community. In the artificial and real networks, S-SGN algorithm reduced the time complexity of S-GN algorithm, And further improve the operational efficiency and performance.
Keywords/Search Tags:Community detection, semi-supervised, GN algorithm, similarity
PDF Full Text Request
Related items