Font Size: a A A

Research On Local Interested Community Detection In Large-Scale Social Network

Posted on:2015-01-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:H J YinFull Text:PDF
GTID:1268330428484466Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the advent of web2.0era, more and more data is presented on the network, users interact with each other more reflected on the network. People are producers of network data, but also the consumer of network data. People’s production and living, learning, entertainment are more and more inseparable from the Internet. The reality of the relationship between people based on the Internet by social networks, strengthen exchanges and interaction between people and promote a faster flow of information across the world. Listed as Facebook, social networking attract a growing number of people’s attention. Facebook is a social network based on strong relationships and help maintain and improve relations between friends; Twitter is a weak relationship social networks, contributes to opinion leaders and the rapid dissemination of information, in favor of advertising marketing in social networks; Linkedin is a professional social platform focusing on the business people to expand their business, job recruitment and other business communication. There are also a lot of social networks in China such as Fan Fou, Di Gu, Suixin Weibo, Sohu Weibo, Follow5, Sina Weibo, Tencent Weibo, NetEase Weibo, Pin Pin Mi, classmates network, MySpace,9911, Baidu I Tie etc., in which the more well known Sina Weibo is similar to Twitter.As of December2012, the number of users in the well-known social network Sina Weibo reached500million; July2012the number of foreign social network Twitter users reached517million; another world-renowned social networking site FaceBook number of users reached1billion. According to data monitoring data known foreign companies PingDom released, social networking links and web plugins have occupied25percent of all network traffic worldwide, has billions of social network users worldwide. On social network analysis, it has a very important meaning to find a variety of community on social network for commodity recommendation, advertising push, friends recommendation, as well as divide the social network.Based on the analysis of the development and research of large-scale social network, this paper mainly made a thorough study on how to effectively tap the communities of interest in large-scale social network. This paper first study of its two sub-problems on social network including user personalized interest modeling and personalized PageRank efficient computing. Upon completion of interest modeling and efficient personalized PageRank calculation, we perform to detect large-scale interest community on the social network.First, we use the relationship of users’friends, microblogging users published and forwarded as interest information. Different for ordinary users and specific users, we propose the three-level models using the concerned object as interest for ordinary users and two-level models using released micro-Bo as interest for specific users. In order to using Microblogging content as interest for modeling, we improve and propose microblogging interest classification based on LDA. For the user interest changing problem, we propose a bayesian method based on user microblogging content as feedback. Furthermore we raise user’s preference model for the purpose of user interest community detection. Finally, we use user tags as a reference to evaluate the model, the model results can have more than80%precision and recall with adequate user tags.Secondly, personalized PageRank as an important algorithms in information retrieval and data mining area. With the increasing size of the data, it is necessary to optimize and accelerate the algorithms. Traditional iterative method is relatively time-consuming and space-consuming, we use a method based on Monte Carlo random walk. MapReduce is suited for data-intensive computing, but not suitable for a large number of iterations, this paper presents a distributed algorithm based on MPI. Improved two way consolidation method to the previous method of Fibonacci-based, theoretically performance has increased by about30%, which is relatively basic method derived performance increased by10%to40%of the large number of experiments on real data.Finally, because community containe the information of the members’structure and members’personalized information, we present a based on personalized PageRank community detecting methods considering structural information as well as the characteristics of the nodes themselves. For the growing mass data of the social network, we propose local communities analysis method and improve the algorithm to implement it on distributed computing MapReduce. As most of the community detection method is not suitable for the analysis of social network with ten million or even larger user scale, and Metis approach is rarely tools able to handle such a large-scale network analysis. We compare the Metis method and the method proposed in this paper, the proposed method has better ability to detect the community, can find clustered strong local communities. In addition, we employ MapReduce experimental which prove the scalability and efficiency of the method.
Keywords/Search Tags:social network, user interest modeling, community detect, microbloggingmarketing, personalized PageRank
PDF Full Text Request
Related items