Font Size: a A A

Research On Microblog Hot Topic Detection Method Based On Maximum Tree Partition

Posted on:2015-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:F ChenFull Text:PDF
GTID:2298330422972462Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of traditional internet technology and mobile internettechnology, the propagation speed and propagation scale of network information havebeen greatly increased, and the way of people’s communication also has been changed.As a rapid rise of emerging internet media, microblog attracts more and more people’sattention. As a platform of message propagation and interactive communication,microblog can produce a large amount of information in a short period of time, whichmakes users easily fall into the local microblog information and loses the understandingof the latest dynamic of the whole microblog community. Facing the vast information ofmicroblog, how to quickly and accurately obtain hot topics throughout the microblogcommunity has become an important research direction.Although the traditional topic detection technology has been relatively mature,which can help users quickly obtain the hidden topics in huge amount of long text, thiskind of method still has many obvious shortcomings in dealing with massive microblogshort text: firstly, the computational complexity is too high, the text similaritycalculation between massive microblog information is fatal for traditional topicdetection system; secondly, losing the semantic information of words, the traditionaltopic detection model calculates the similarity of documents only by the repetitiouswords, which ignores the semantic relations between words.In view of the above problems, this paper proposes a microblog hot topic detectionmethod based on maximum tree partition by learning related theories and algorithms ofmicroblog hot topic detection and analyzing the advantages and disadvantages of theexisting microblog hot topic detection methods and combining with the characteristicsof microblog. The experimental results on the collected microblog data sets verify theeffectiveness of the method in this paper. The main contributions of the proposedmethod in this paper are as follows:①This paper puts forward an idea which only detects microblog topics from themicroblog data that are published over a period of time, which satisfies the requirementsof hot topic detection in the actual microblog system. Meanwhile, it can remove thehistory topics’ effects in the process of detecting new topics.②This paper improves the calculation method about feature term weight andmicroblog similarity. By combining the semantic similarity information between words into the existing calculation methods, it reduces the calculation errors in Chinesemicroblog caused by polysemy and synonym,then improves the accuracy.③In order to reduce the computational scale and improve clustering accuracy, thispaper proposes a microblog hot topic detection method based on maximum treepartition. By generating the maximum tree of fuzzy similar matrix, it removes the noisesimilar data between microblogs and reduces the computing scale. At the same time,using the improved K-means clustering method can determine the number of clustersautomatically which improves the clustering accuracy. In addition, this paper proposes amethod of calculating the heat of microblog topics, which can be used to sort the heat oftopics and find the hot topics.④Compared with other microblog topic detection methods, this method is higherin the overall execution efficiency and accuracy. It has effectively improved thetraditional topic detection method in the problems of low efficiency when dealing withlarge-scale data.
Keywords/Search Tags:Microblog, Hot topic detection, Maximum tree, Semantic similarity, Clustering algorithm
PDF Full Text Request
Related items