Font Size: a A A

Research On K-means-Type Community Detection In Complex Networks

Posted on:2018-01-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y F LiFull Text:PDF
GTID:1318330518989463Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information science and technology, a large num-ber of social platforms emerge and boost, such as Twitter, Facebook, Sina Microblog,WeChat, which have been constantly changing the way of people's living and working.These online social platforms produce massive complex network data sets every day, in-cluding node connections and node attributes. The statistical properties, nature structures,and interaction rules of social networks underlie these linked data. The content data pro-vides rich digital images, texts, video, and other kinds of node attribute feature. Exploring and analyzing the latent structures of these networks provide unprecedented challenges and opportunities for different areas, like machine learning and data mining.Extracting communities is a basic problem in complex network analysis, which play an important role in data partition, compressing, visualization, recommendation and so on. Since the problem was presented, a body of approaches have been proposed, of which,the K-means algorithm is a classic partition clustering algorithm. Due to its simplicity,efficiency and scalability for large scale data sets, it has a wide application in linked data clustering. However, the K-means algorithm has some weakness: 1) It is sensitive to initial centers, which dramatically affect the clustering performance; 2) It requires a predefined cluster number. For network data, how to design an effective initializing strategy is for further development. Besides, the network data is usually sparse and noisy,when the community structure is unclear, how to utilize auxiliary information in networks to detect meaningful community structure is challenging.In this thesis, we research on network data and address the issues to select cluster number and initial centers, to extract communities by combining node attributes in net-work data. Besides, we exploit the problem of attributed graphs clustering with edge uncertainty. The main contributions of this dissertation are in the following four folds:1) We propose a parameter-free community detection method. In this method, ac-cording to the characteristic of the network data, two measurements evaluating centers are proposed based on node centrality and dispersion to decide community number and initial centers. This initialization method can provide a guideline for K-means-type com-munity detection. Compared with other initial centers selecting algorithms, our proposed method does not require any parameter. Experimental results on synthetic and real-world networks demonstrate the superior performance of our approach over competing methods for clustering nodes. Based on this work, we propose a simple and flexible community detection approach by enhancement of k nearest neighbor graph of node attributes (kNN-enhance). We add the k nearest neighbor graph of node attributes to alleviate the sparsity and the noise effect of an original network, thereby strengthening the community struc-ture in the network. Analyses on synthetic and real world networks have shown that the proposed algorithm achieves better performance compared to existing state-of-the-art al-gorithms. Further, the algorithm is able to deal with networks containing different types of attributes, like binary, categorical, or numerical.2) We propose a semi-supervised community detection algorithm with active node and link selection. For a network with unclear community structure, it is difficult to de-termine the exact community number and select the initial central nodes. Besides, nodes are cline to be miss-classified. Based on active learning, an active node and link selection strategy is presented. This is a two-way strategy, which improves the accuracy of com-munity detection by agglomerating nodes in a community and sharpening the boundaries between communities. By active centers selection, it can automatically decide the num-ber of community and select initial central nodes. With a tiny of prior information, the proposed method can significantly improve the performance of community detection.3) We propose a novel weighted K-means algorithm with "locak" learning for at-tributed graph clustering, called Adapt-SA (Adaptive fusion of Structural and Attribute information). The key advantage of this model is to automatically balance the struc-tural connections and attribute information of each node to learn a fusion weight, and get densely connected clusters with high attribute semantic similarity. Experimental s-tudy of weights on both synthetic and real-world data sets showed that the weights learnt by Adapt-SA were reasonable, and they reflected which one of these two types of in-formation was more important to decide the membership of a node. We also compared Adapt-SA with the state-of-the-art algorithms on the real-world networks with varieties of characteristics. The experimental results demonstrated that our method outperformed the other algorithms in partitioning an attributed graph.4) We study the problem of node clustering on attributed uncertain graphs. In real ap-plications, networks are usually embedded with uncertain edges and abundant attributes.For this kind of complex networks, we propose two novel approaches: Attributed un-certain graphs clustering based upon integrated attribute induced graphs (AUG-I) and attributed uncertain graphs clustering based upon based upon the unified partition (AUG-U) over possible worlds of a uncertain graph. The uncertain edges can help identify the set of relevant attributes in the nodes. While the focus attributes can help reduce the uncertainty in edges for community detection. Extensive empirical studies on synthetic and real-world data sets demonstrate the effectiveness of our approaches for clustering tasks on attributed uncertain graphs. Besides, we analyze the robustness of our proposed methods in different parameter settings.
Keywords/Search Tags:complex network analysis, community detection, K-means algorithm, initial centers selection, community number, uncertainty
PDF Full Text Request
Related items