Font Size: a A A

Research On Multi-view Topic Detection Method In Twitter

Posted on:2013-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y X FangFull Text:PDF
GTID:2268330392469039Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the striking booming of the development of computer and internettechniques, more and more attentions of netizens are focused on social media.Micro-blogging, a new kind of social media, has been widely recognized andaccepted by people recently, and massive of important events are spread to peopleby it in the first time. Lots of hot topics are embedded in those massive amounts ofshort texts in micro-blogging and automatic topic detection techniques can providepeople all-around dynamic messages. However, traditional topic detectionalgorithms are not suitable for micro-blogging, for its extreme limitation of thenumber of words in text. In order to solve those problems, this paper studies a newtopic detection method by using multi-view technique. It combines the semanticsrelation and social relation among tweets and has good topic detection performance.The main work and contributions in this paper are as follows.Firstly, this paper propose a new method for topic detection by usingmulti-view technique. In order to measure the relations among tweets, this paper notonly use the traditional text semantic relation, but also the social relation amongtweets which can make up the deficiency of semantic relation. The semantic relationand social relation can be constructed as a mult-view, and then we can use a spectralclustering based multi-view clustering algorithm to cluster tweets, and then extractkeywords from clusters which can be regarded as topics. Experimental results showthat the performance of multi-view clustering is much better than that of anysingle-view clustering.Secondly, in order to better measure the semantic relation among tweets, wepropose a new weighted document similarity calculation method by using phrasesdetected by suffix tree. In this method, we firslty use the suffix tree to detect thecommon phrases among tweets. Since the phrase is often much meaningful than therandom combination of single words, so we can better measure the similarity amongdocument by assgning extra weight for words in phrase. Experiment results showthat we can achieve better clustering performance by assigning extra weight forwors in phrases.Thirdly, to measure the social relation among tweets, we propose to use thespecial social symbols of Twitter platform such as#Mention,@reply etc. to measuretheir relation. Experimental results show that it is indeed an useful way to measurethe relation among tweets.Finally, a twitter topic detection system based on our multi-view techniquebased topic detection method is designed and implemented. It provides a tool for topic detection by using multi-view technique. Also, it provides a basic platform forfurther research and algorithm implementation.
Keywords/Search Tags:twitter topic detection, multi-view clustering, suffix tree
PDF Full Text Request
Related items