Font Size: a A A

Research On The Technology Of Micro-blog Interest Community Discovery And Hot Topic Detection Within Community

Posted on:2015-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:R F CuiFull Text:PDF
GTID:2308330482479207Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In contrast to other media, micro-blog is more flexible, convenient and alternant, which provides a platform for every network users to show themselves and has been quickly attracting a large number of users therefore. However, with its rap id development, it has brought austere challenges to network public opinion supervision. Micro-blog interest community is a kind of non-entity community, and members within community often have similar interests. Topics within the community are preferentially seen and bring easily resonance, which makes these topics spread quickly and extensively. So, hot topics within community are often the root of massive public opinion. From analysis above, it is significant for public opinion supervision to discover micro-blog interest community and detect hot topics within community.Now, the technology of micro-blog interest community discovery and hot topics detection within community is faced with the following questions:(1) Existing methods of interest community discovery assume that user interest is static, which ignores the effect caused by interest drift.(2) Communities in real world are mostly overlapping, but the accuracy of existing methods is low about non-overlapping community.(3) Hot topics detection technology is faced with data sparsity caused by short text, which brings serious effect to rat ionality and sufficiency of character space. From the three problems above, this paper launche s the research and the main contribution is following:1. A micro-blog user interest model based on forgetting curve is proposed. User interest, which is not stable forever, is gradually changing with the effect of external impulse. The farther the distance between user attention time to a message and current time is, the weaker the reference value of the message is, and the higher interest degree to the field is, the more a user pay attention to a concern field. These two points can be regarded as the process of human forgetting gradually and repeated learning to knowledge. Therefore, this paper proposed a n interest model of micro-blog user based on forgetting curve. Experiments show that the model can predict micro-blog user’s interest accurately, with the recall rate of 85.3%.2. A micro-blog community discovery algorithm based on user interest and link is proposed. The paper mapped the interest similarity matrix, which was got by the micro-blog user interest model based on forgetting curve, into a virtual interest network and sought its link similarity, and then attained the total link similarity by combing real attention relationship among users. To utilize link similarity to community discovery, we generalized the ward hierarchical clustering algorithm so that it is applicable to any object that has similarity measurement. And as an application we particularly employed this algorithm to discover community. Experiments show that the algorithm can discover micro-blog overlapping community without priori-knowledge, with the accuracy of 83.4%.3. A micro-blog hot topic detection method based on comments tree is proposed. Firstly, this paper analyzed the characteristics of the micro-blog text and designed filtering algorithm of garbage micro-blog. Then, in order to solve the problem of data sparsity, taking full advantage of the feature of community tightness, the paper proposed the concept of micro-blog comments tree and one evaluation model of hot topic. Finally, based on the two points above, the paper proposed a hot topic detection method within community. Experiments show that the construction of comments tree increases the performance of hot topic detection system by 3%.
Keywords/Search Tags:micro-blog community, forgetting curve, user interest, link clustering, community discovery, comments tree, topic detection
PDF Full Text Request
Related items