Font Size: a A A

Community Detection And Analysis On Microblog Data Based On Random Walk

Posted on:2016-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q TanFull Text:PDF
GTID:2348330488974136Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, the rapid development of social media have a great influence on the people's daily lives and the way of sharing information, especially the rise of Sina Weibo, the traditional interpersonal communication shift from offline to online, a large number of users generate rich user behavior data, which promote the research of user recommendation and e-commerce, community mining on blogging data is one of the important work. At present, communities mining is more used on biological networks or social network, the methods on those networks is more likely to find set of nodes based on topology or control relationship, but blogging data has its own unique attributes and background: Topic in blogging data is typical obey power law distribution, which makes traditional methods can't be applied in blogging data mining to recommend topic, which is real user scenarios on community mining of blogging network.Because the hot topics can't effectively differentiate user, in the community excavation process it will lead that a large number of users to join in the community which hot topic belong to, in this paper we analyzes the distribution of topics, founding that most of the hot topics can't have a positive impact on differentiating user's personalization, so in this paper we introduce TF-IDF for user-topic relationship to update its weight, increasing the weight of user-topic which has a high weight discrimination, which makes communities prefer to cover real y important topic and to recommend potential users.Based on the blogging network's diversity and social interaction user interest in, this paper find overlapping community structure in the network with nodes incorporating topic and user. Due to the simultaneous presence of user-user and user-topic links in the network, we introduce restart random walk algorithm to unify the distance between structure and properties links, makes it possible to carry out the traditional communities method to heterogeneous networks. In order to compare the impact on community structures of joining different information into network, we construct three network with different properties, namely the structure edge(users attention and common attention) and attributes edge, structure edge(user attention) and attributes edge(TF-IDF), structure edge(users attention and common attention) and attributes edge(TF-IDF); and on these three networks mining overlap community, from different point of view to analyze the community structure.In this paper, use the real blogging data to experiment, complete the participle and topic extraction system; The result prove that user tend to form different community based on different interetsts, and the TF-IDF this paper introduce really can produce effective guidance on mining community, confirming the framework can be efficient to describe overlapping community structure in data set and has good explanatory.
Keywords/Search Tags:Microblogging network, Topic, TF-IDF, O verlapping community, Random walk
PDF Full Text Request
Related items