Font Size: a A A

Research On Microblog Topic Detection And Tracking

Posted on:2013-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:H C ZouFull Text:PDF
GTID:2248330395480573Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, as a new kind of Internet media, microblog is acceptedby more and more people. Topic Detection and Tracking(TDT) techniques mainly study how toreasonably classify the massive microblog information, find in time and track importantinformation, which already becomes one of current research hot spots. For this, the dissertationmainly studies these contents including microblog tweets scale prediction, microblog datascleansing, microblog topic detection and tracking, main research fruits are as follow:(1) In the way of microblog tweets scale prediction, aiming at the three characteristics ofmicroblog posting behaviour such as random, independent and orderly, the nonhomogeneouspoisson process(NHPP) microblog tweets scale prediction model is established and solved on thebasis of actual datas. The experiment evidences the the feasibility and rationality of the NHPPprediction model and shows that the prediction effect of NHPP model is better than that ofGM(1,1) model.(2) In the way of microblog datas cleansing, aiming at the quality problems of microblog datascaused by colloquialism and non-normative of microblog language, three cleansing algorithmssuch as the centroid, degree-epicenter value and eigenvector-epicenter value are applied tocleanse microblog datas. According to the quality indicators such as normativity, relevance andhelpness, the cleansing effects of the three algorithms are compared and analysed. On that basis,a framework of microblog datas cleansing processing system is devised. The experiment showsthat: the quality indicators’ values of microblog datas increase obviously by more than20percentages after cleansing processing on average.(3) In the way of microblog topic detection, aiming at the problem of characteristics’ sparsityof microblog datas, MB-SinglePass microblog topic detection algorithm is proposed. Thealgorithm extends the characteristics using the synonym thesaurus, proposes the combinedsimilarity strategy which merges together the cosine similarity, jaccard similarity and semanticsimilarity, applies dual threshold value and dynamic topic model strategies, furthermore,conducts topic detection combining with the microblog structural informations such as mutualattentions between linkmen and the inner connection relationships such as forwarding andcomment between tweets. The experiment shows that: detection effect by using combinedsimilarity strategy is better than that by using singular similarity strategy. Compared withMB-InC and MB-InK detection algorithms, MB-SinglePass algorithm shows better detectionperformance.(4) In the way of microblog topic tracking, aiming at the problem of sparsity of trainingsamples, SA-MBLDA microblog topic tracking method is proposed. The method introduces thehidden variables of topic interest of relevant people, uses the topic connection relationshipsbetween microblog original tweet and retweet or comments based on the topic probability idea inorder to construct topic training model, selects the tweets which will participate in topic modelreconstuction by setting the feedback threshold value of the degree of correlation, employs thedynamic feedback step strategy, realizes topic model reconstruction with self-adaption, furthermore, takes the method of weighing the new and old topic models in order to reduce theerror of topic model reconstruction. The experiment shows that: tracking performance ofSA-MBLDA method is better than that of tracking method based on LDA model.
Keywords/Search Tags:Microblog, Nonhomogeneous Poisson Process, Data Cleansing, Topic Detection, Topic Tracking
PDF Full Text Request
Related items