Font Size: a A A

Clustering Algorithm Based On Attributes And Relationships

Posted on:2015-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ShengFull Text:PDF
GTID:2298330467458433Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rise of social networks such as social networks, biological networks, the study of social networks has got more and more attentions. Typically, social network can be modeled as graphs, where nodes represent the objects that we studied and the edges indicate relationships among them. So we can study those graphs to find the potential relationships among objects by putting the nodes with similar relationships into the same group to help people understanding the structure and function of social networks from different perspectives.More and more social network resources have emerged recently. In order to understand the structure and function of a social network, we cannot just use information of object’s attribute or relationships among objects. It is necessary to explore information provided by both attribute and relationships of objects. At present, most clustering algorithms or graph clustering algorithms pay attention to just one of attributes of nodes or relationships of nodes. A little algorithm not only takes advantage of attributes but also relationships, such as SNAP. These algorithms can summarize social network well, however, they have limits to some extent. In the study of social networks, clustering algorithms can be roughly divided into four categories, the first one is based on hierarchical, the second one is based on partition, the third one is based on grid and density, the last one uses other approach. The partition clustering algorithms are simple and high efficient, which also have some flexibility for large data sets, so this kind of algorithm has been applied more widely. In this paper we mainly want to improve the corresponding algorithm based on the partition. After SNAP has proposed, researchers also do some improvements with SNAP algorithm, such as the later one K-SNAP. Although these algorithms can take both attributes and relationships into consideration, they also have shortcomings. For example, during the phase of initial partition by attributes, if the range of attribute is large, SNAP will divide nodes into many small clusters, and this is obviously not an efficiency method for community-division.In this paper, based on studying of some classical clustering algorithms, we propose an algorithm, which improves the K-SNAP in the initial division of attributes and in the partition criterion of nodes. Firstly, we pre-process numerical attributes based on the approach of CANAL to decrease the number of useless groups. Then, we use Q function theory to introduce density as a new criterion when clustering nodes by relationships, so we can get groups of realistic meaning. Finally, we validate the effectiveness of our approach through an extensive experimental study over a real data set. The results show that our approach is effective and efficient.
Keywords/Search Tags:SNAP, Attribute, Relationship, Clustering, Modularity
PDF Full Text Request
Related items