Font Size: a A A

Research On Several Problems Of Community Detection In Web Networks

Posted on:2016-05-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:1108330482479099Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and communication technologies,many new kinds of social networks widely emerge in real world applications, and their structures become more and more huge and complicated.However, these conditions raise two great challenges for us:First, when we can only get some limited or local information about a large-scale network, how to use the limited local information to find the local community quickly and efficiently? Secondly, in the heterogeneous networks composed of multimode nodes and multi-relational links, how to reveal the hidden community structure underlying heterogeneous interactions?In this context, the traditional community detection methods are often no longer applicable. Since these works cannot work well on discovering the hidden community structure underlying in the local networks and heterogeneous networks. In order to detecting the locality and heterogeneity of Web networks,we have made several beneficial exploration and research for discovering the community structures of them.The main work is as follows:1.Due to the boundary nodes are hard to be clustered accurately, a local community detection algorithm is proposed based on the identification of boundary nodes.It uses two self-adaptive phases in detecting the local community, thus comprehensively considering the external and internal link similarity of neighborhood nodes in each clustering iteration.Based on boundary nodes identification,our self-adaptive method can effectively control the scale and scope of the local community. Different from the traditional local community detection methods whose idea is to maximize the local community structure indexes,this algorithm uses the node similarity modularity to measure the performance of the local community discovery. In this way, our method can effectively solve the two big problems existed in the related work, including when to stop the clustering iteration and the clustering results sensitive to the given node’s position.2.In order to overcome the limitation caused by the randomness of the given node’s position, a local community detection algorithm is propose based on the core nodes jumping. This method tries to avoid expanding directly from a given node, but firstly search the core nodes near around the given node, and then jump to the core nodes and expand them as the core sub-group.We develop a novel fitness function of the sub-groups, and finally obtain the local community structure by combining these subgroups according to their fitness.Compared to the traditional algorithms, this method can switch from any given node to its neighbouring core nodes, thus prevent the local community discovery from the limit caused by the given node’s position and influence. Moreover, this method can precisely control the expansion of the local community, and obtain the distribution of the core nodes in the range of the community.3.Extended from the bipartite graph, a novel network model is proposed to depict the heterogeneous networks as a number of correlated bipartite graphs. This model can describe the structure of the heterogeneous networks, and indicate the complicated relationship between the multi-mode nodes. Moreover, it provides certain model for multimode heterogeneous network structure of subsequent data mining based (theory).This model successfully makes the two different kinds of networks,the two-mode networks and the multi-mode/multi-dimensional networks, be unified and compatible in concept. Thus, it lays the groundwork for mining and analyzing the structure of heterogeneous networks, and makes it possible to introducing more theories and methods to detect the structures of heterogeneous networks.4. For the two-mode networks, a nonnegative tri-matrix factorization algorithm (F-NMTF) is proposed based on graph regularized. To estimate the structural characteristics of user space and object space, this method constructs the affinity graphs of two-mode nodes respectively, and takes them as the regularization constraint to introduce into the NMTF model.In this way, we can use both the inter-type information and intra-type information of two-mode networks, which can effectively enhance the orthogonality and the convergence of the nonnegative matrix factorization,and overcome the limit of 2-mode structure. Then this method divides NMTF into two sub-problems for solving the minimization function, and develops an alternative multiplicative update rules for the matrix factorization iteration, which can make the factorization more simple and fast. Experiment and analysis on synthetic and real bipartite networks show that F-NMTF achieves high accuracy and stability on the results of community detection, which is efficient and well-behaved in detecting community structure of bipartite networks.5.For the heterogeneous networks, a joint nonnegative matrix factorization (Joint-NMF) algorithm is proposed based on the heterogeneous network model. Firstly, this method transforms the heterogeneous dataset into a number of bipartite graphs correlated. Taking inspiration from the multi-view method, we extend the semi-supervised learning based on single graph to several bipartite graphs with multi-view. In this way, it provides mutual information between different network graphs to realize the collaborative learning of different classifiers. During the process of clustering, Joint-NMF algorithm comprehensively considers the structure of all bipartite graphs, makes all the classifiers tend to reach a consensus on the clustering results of the target-mode nodes.The experimental results show that, for heterogeneous networks, our semi-supervised Joint-NMF algorithm has better performance than the unsupervised algorithms on the community detection.
Keywords/Search Tags:Community Detection, Boundary Identification, Core Nodes, Local Community, Bipartite Graph, Heterogeneous Networks, Nonnegative Matrix Factorization, Semi-supervised Learning
PDF Full Text Request
Related items