Font Size: a A A

Discovery Of Web Communities Based On Social Network Analysis

Posted on:2014-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:X ShuFull Text:PDF
GTID:2268330401476357Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Micro-blog is an emerging form of informal community tool developing rapidly afterblog. As a new medium, it plays a major role in Chinese social network, whose fragmentmessages penetrate into various areas of social life, and has aroused the enthusiasm of usingmicro-blogging service.Micro-blog growth is massive, and different domains or interests ofcommunities are formed. As important components of micro-blog social network, they are notonly able to provide users with timely reliable and valuable information for making friends,but also bring new mode of transmission for commerce and media. Therefore, how to detectmicro-blog communities is of great value.Community discovery is mainly to divide network into groups, in order to accelerateprocess of spontaneous community formation. A variety of approaches have been taken tomake concrete the idea of communities, giving rise to a number of efficient methods forcommunity identification. Early solutions are generally based on hierarchical approaches,whose result is a tree of communities called dendrogram, analyzing linking structure orcontent to detect communities. However, communication mode is changing. Traditionalalgorithms are not suitable for emerging social network tools any more because of lowefficiency and accuracy. In recent years, many classification and clustering algorithms are alsoproposed for emerging social network tools. In such case, we develop a micro-blogcommunity discovery algorithm based on social network analysis, taking both link and topicrelations into consideration.Firstly, this paper analyzes structure and characteristic of Sina micro-blog.The biggestdifference between Sina and other tools is that the former employs a social-networking modelcalled “following”. Additionally, it differs form other communication tools in that its contentis typically brief in both actual size and aggregate file size and updates is frequent.Secondly, social network is used to construct a micro-blog model, which describes themicro-blog specialized linking structure and users’ interest similarity.“Small-world” effect,power-law distribution and topic homophily have been confirmed in micro-blog, which isconsistent of social network model.Thirdly, it develops a community discovery algorithm based on both topic and linkanalysis. Label Propagation algorithm is a kind of semi-supervised classification method,which is rapidly and efficient. But when calculating edge weight, the relationship betweenusers wasn’t taken into consideration, which affects accuracy. According to linking relationsand topic similarity, we deduce users’ relationship formulas to calculate relevancy, and transform it into edge weight. Then we use improved label propagation algorithm to dividethem into several groups.Finally, social network analysis is used to analyze communities quantitatively andqualitatively. We calculate network’s density and centralization, and then analyze reachability.We use simulation software “Pajek” to make communities visual, in order to get a rational anddirect recognition.
Keywords/Search Tags:micro-blog, community discovery, topic model, link analysis, homophilysocial, network analysis
PDF Full Text Request
Related items