Clustering Algorithm Research Based On Semi-supervised GN Algorithm

Posted on:2017-05-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y Z Yu

Full Text:PDF

GTID:2308330482980516

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In the era of "Internet plus", the explosive growth of network data leads to the phenomenon processing and analyzing the data in the network has become a thorny issue. There is an effective way called community detection which process and analyze data in a way that the data in the network can be divided into several communities quickly according to a certain attribute. Current community discovery methods have to specify prior community number, to determine in advance after partition size of each community or other on the scope of community information as well as pre specified evaluation index and other defects, and most of the community found algorithm is unsupervised category, can not have effective use of known a priori knowledge. The classical GN(Givern-Newman) algorithm is a kind of split hierarchical clustering algorithm based on heuristic algorithm, which has high accuracy and can be applied to small and medium network. But GN algorithm can not use the known prior knowledge and its time complexity is relatively high. In this paper, aiming at the problem,we improving GN algorithm, The main research achievements are as follows:(1) GN clustering algorithm based on semi-supervisedIn order to solve the problem that the classical GN algorithm can not take advantage of the known prior information, this paper proposes a semi-supervised GN clustering algorithm, denoted as S-GN(based on semi-supervised Givern-Newman). S-GN algorithm uses based on the constrained semi-supervised learning method, learning the known must-link constraints, cannot-link constraints information, to learn knowledge through based on distance measurement method is verified to improve the quantity and quality of the prior information. The constraints of the known constraints on the integration of the network, delete the Cannot-link constraint information from the network. Through the iterative calculation of boundary value of the betweenness and deleting the full-betweenness of the maximum value to the network edge, to divide the state into the best community. In artificial and real networks analysis S-GN algorithm, the time complexity of the S-GN algorithm needs to be improved, but it can improve the accuracy and efficiency of the algorithm.(2) semi-supervised GN clustering algorithm based on similarityIn order to solve the problem of high time complexity, an improved GN algorithm based on the similarity of cluster analysis is proposed, which is denoted as S-SGN(based on similarity and semi-supervised Givern-Newman). By using the known and learned prior information, the relationship between the node pair and the corresponding value in the initial matrix of the network is modified according to the different similarity construction method. By iterative calculation of the value of the node similarity, delete the edge of the smallest similarity value to divide the state into the best community. In the artificial and real networks, S-SGN algorithm reduced the time complexity of S-GN algorithm, And further improve the operational efficiency and performance.

Keywords/Search Tags:

Community detection, semi-supervised, GN algorithm, similarity

PDF Full Text Request

Related items

1	Clustering Algorithm Research Based On Semi-supervised GN Algorithm
2	Semi-supervised Community Detection Algorithm Based On Local Label Information
3	Research On Semi-supervised Community Detection Method Based On Distance Dynamics And Anti-noise Performance
4	Research On Semi-supervised Community Detection Algorithm
5	Research On Robust Semi-supervised Community Detection Method
6	Research On Semi-supervised Clustering And Classification Algorithm
7	Research On Several Problems Of Community Detection In Web Networks
8	Research Of Active Semi-supervised Clustering And Its Application In Community Detection
9	Outlier Detection And Semi-supervised Clustering Algorithm Based On Shared Nearest Neighbors
10	Research On Adaptive Selection Of Distance Metric Functions In Semi-Supervised Classification