Font Size: a A A

Community Discovery And Application Based On Topic Model

Posted on:2019-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:Q QinFull Text:PDF
GTID:2428330566974011Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The vigorous development of Internet technology,the continuous emergence of science and technology,and the increasing social sphere of human beings are not limited by time,space or geographical constraints.Weibo,WeChat and other social platforms have gradually become an indispensable part of people's lives.Every user in the social network is always generating a large amount of data,which contains a great deal of behavioral information and content information: the behavioral information implies the social network structure relationship between the user and the user;and the content information contains the user's Basic information,such as published blog,this implies the user's hobbies.For the users,facing the mass users,how to find out the groups which are in accordance with their own interest is obviously very pale only through the relevant search;The other side for the business,how to find similar groups of users and recommend related products.Therefore,the micro-blog community discovery is particularly important,but also for the relevant advertising recommendation,public opinion monitoring to provide the appropriate theoretical support.This article first focuses on the current community discovery research methods and community discovery of the classic algorithm.Secondly,we analyze the characteristics of the microblog user's blog text,that is,the content of the microblog is sparse,there is a lot of colloquialisms,web terms,emoticons and so on,usually less than 140 words.So that the direct use LDA theme model to extract the user's interest,the effect of extraction is often unsatisfactory.Therefore,this article from the direction of blog noise,through the expansion of the user's blog text,remove the relevant text in the noise and improve the quality of the user's blog text;through the documents related to the merger operation to solve the problem of a single the user's blog short text.Based on this,the user-interest-keyword model is proposed through TF-IDF keyword extraction combined with LDA topic model to obtain the user's interest distribution in Weibo.This model is solved by Gibbs sampling algorithm.The user interest probability distribution and the interest keyword probability distribution are obtained.Experiments show that,the model is better than the LDA topic model in Weibo platform.Furthermore,when analyzing the advantages and disadvantages of common community discovery algorithms,this paper points out that the label propagation algorithm(LPA)has the advantages of low time complexity,no need to set the number of communities in advance,simple calculation process,high efficiency in dealing with large complex networks.However,in the process of label propagation,this algorithm does not consider the similarity of adjacent nodes in the network structure and the content similarity between nodes.Therefore,this paper proposes a multi-feature fusion label propagation algorithm from the perspective of node similarity.The algorithm uses SimRank algorithm to calculate the structural similarity of nodes in the network,at the same time,the similarity of the content of the nodes,namely,the similarity of user interest distribution,is fused to distinguish the labels transmitted from different adjacent nodes,that is,the labels of adjacent nodes are given different weights.The more similar nodes,the greater the weight of the label.Experimental comparison,the algorithm is better than the traditional label propagation algorithm.Finally,based on the above method,this paper designs and implements a community discovery system.The system has data acquisition,text preprocessing,user's interest,community discovery and visual presentation.
Keywords/Search Tags:LDA, LPA, TF-IDF, SimRank, Community discovery
PDF Full Text Request
Related items