Font Size: a A A

Key Technologies Research On Community Detection In Social Networks

Posted on:2013-03-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:W Q LinFull Text:PDF
GTID:1268330422974170Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
A social network is a social structure made up of a set of entities such asindividuals or organizations, and the dyadic ties between these entities. Generally, asocial network is related with the research goals and can be used to describe differentrelationships in different fields. Commonly, social networks include human interactingnetworks, enzymatic protein networks, biologic chain networks, metabolic networks anddifferent kinds of virtual interacting networks of the Internet. The social network can bedescribed by a dynamical evolution graph, in which each node stands for an entity andeach edge between two nodes stands for the interaction between them. Communities areimportant components of the social networks. Usually, the community is formed with aset of nodes, in which the nodes are densely connected internally and loosely connectedexternally. Community detection is critical for the topological structures understanding,function analysis, and node actions prediction of the social networks. Hence, the socialnetwork has important theoretical and practical value on research and applications.Moreover, community detection is commonly used in many applications such asterrorist organization identification, metabolic pathway prediction, search enginedevelopment and public sentiment monitoring, etc. In this paper, according to differentapplication environment, we focus on community detection in multigraph socialnetwork, bipartite social network, incomplete information social network and the socialnetwork with multi-type nodes respectively. The contributions of this paper include:Firstly, a parallelizable method based on hierarchical subgraph trees forcommunity detection in multigraph social network is proposed. According to themathematical definition of multigraph, the multigraph is permitted to have multipleedges between any two nodes, and each edge is permitted to attach with a naturalnumber weight. We use the multigraph to describe the social networks. Based on themultigraph model, the density of any subgraph in the social network is defined bysimultaneous considering the sum of the weight and the total number of the edges in thesubgraph. In order to generate the hierarchical subgraph tree (called HAT), aparallelizable algorithm for generating hierarchical communities in static social networkis proposed. Considering the dynamical social network only changes in a small localarea in the short continuous time, a parallelizable algorithm for generating hierarchicalcommunities in dynamic social network is designed. Through the experiments on realand synthetic datasets, we demonstrate our algorithms are more accurate than thecompared algorithms such as GN etc. Further experiments on big data social network,which has more than ten millions of nodes and edges, validate our algorithms arescalable and fit for community detection in the big data social networks.Secondly, a community detection method based on closeness and Mahalanobis distance for the incomplete information social network is presented. We assume eachnode in the incomplete information social network is attached with a set of attributes,which is described by a vector. We also assume there are many local completeinformation regions in the incomplete information network. Moreover, in each localcomplete information region, the complete link information is provided, but other linkinformation outside of the local complete information areas is unknown or incomplete.The main idea of detecting the community in incomplete information networks is asfollows. We first learn a global distance metric from the local information regions withcomplete link information. Then, we use the global metric to measure the distancebetween any pair of nodes in the network. Because the metric is learned from thestructure of the network, the distance should also reflect the missing link structure in thenetwork. In order to learn such a global distance metric, the closeness between any pairof nodes is defined by considering how many neighbors they shared and how manyneighbors they have respectively. The value of closeness ranges from0to1, and thelarger value of closeness between any two nodes, the closer between them from theview of link structure. Subsequently, the closeness between any pair of nodes in eachlocal complete information region can be computed. Consequently, the problem oflearning the global distance metric is transformed into learning a Mahalanobis matrixwhich satisfies: for any pair of nodes in any local complete information region, thehigher value of closeness between them, the short distance between them after beingprojected in the Mahalanobis space. Finally, a distance-based community detectionalgorithm for incomplete information social network is presented. To speed up theclustering process, a heuristic method for generating the-approximation communityis designed. The experiments on real data sets demonstrate the effectiveness andefficiency of the proposed methods.Thirdly, an overlapping community detection method based on imagetransformation for bipartite social network is presented. The bipartite social network ismade up of two sub-networks. Particularly, in bipartite social network, edges can onlyexist between nodes from different sub-networks, and not exist between the nodes fromthe same sub-networks. The problem of community detection in bipartite social networkis to discover the community pairs which exist in different sub-networks of the bipartitesocial network. Assume there are m nodes in one sub-network and n nodes in anothersub-network. Then, the bipartite social network can be described by a0/1relationalmatrix or a white and black image. Then, the community detection problem in bipartitesocial network is transformed into how to adjust the rows and column of the white andblack image for clustering the black pixel together as much as possible. Based on theHuffman coding theory for image, we try to compress the total information which isused to describe the image by adjusting the positions of rows and columns. Then, a greedy co-clustering algorithm, which can discover the row clusters and column clusters,is improved. The blocks, which are at the crossing of the row clusters and columnclusters, either have a very high or a very low density of pixel. In order to detect theoverlapping row clusters and column clusters, a gain function is defined. Next, anoverlapping co-clustering algorithm based on the gain function is designed. The mainidea of this algorithm is that: for any row (column), if it can be put into more than onerow (column) clusters, it should improve the value of the gain function, otherwise, itshould not be put into the new row (column) cluster. The experiments on synthetic andreal data sets validate the effectiveness and efficiency of the proposed methods.Fourthly, a community detection method for social networks with multi-type nodesis presented. In the social networks with multi-type nodes, the edges exist betweennodes from not only the same type but also different type. Assume there are n types ofnodes in the social network, we can use n(n+1)/2images at the most to describe therelationships between different types of nodes in the social network. We firstconcatenate the n images, which share the same type of nodes. Then, for the second typeof nodes, the left n-1images are concatenated together as the first type of nodes. Thisprocess is repeated until all of the images are dispatched to different concatenatedimages. According to the Huffman coding theory for image, the problem of communitydetection in social networks with multi-type of nodes is transformed into a new problemwhich is how to compress the set of concatenated images by adjusting the positions ofrows and columns in the concatenated images. Since the new problem was proved to beNP-hard, two greedy algorithms, which are used to detect communities when thenumber of communities are provided and not provided, are presented respectively.Experiments on synthetic and real datasets demonstrate the presented algorithms aremore effective than the compared methods such as NMF.
Keywords/Search Tags:Social Networks, Community Detection, HierarchicalCommunities, Big Data, Incomplete Information Social Networks, Bipartite SocialNetworks, Multi-type
PDF Full Text Request
Related items