Font Size: a A A

Research And Design Of Microblog Topic Tracking Method

Posted on:2015-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:H WangFull Text:PDF
GTID:2268330425988822Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
ABSTRACT:Nowadays, Internet plays an important role increasingly in people’s daily lives. Both people’s work and life need support of the Internet. With the increasing development of internet technology, an information platform called Twitter emerged in America, and domestic Sina, Tencent microblog also appeared soon. With microblog platform, users can publish information through brief sentences which is less than140words. Users can also repost and comment interesting microblog. Such efficient platform can make a valuable news spread throughout the entire network in just a few minutes, which greatly improves the efficiency of users accessing the latest news. However, in today’s information explosion, the vast amounts of information may lead people to be lost. So now we need a way to integrate and process the information, so that people can get the information they want according to their needs.This article studied microblog text representation. For the microblog features such as short, real-time, colloquialism and originality, in the original vector space model, we propose a suitable method for microblog text representation. In this way, before the process of the microblog, the first step is to filter the microblog which is less than N words. Make all the content words as feature words after participation. And we proposed T-TFIDF weight calculation method according to the features of microblog. This promotes the words weight in the microblog title. With these improvements, we can represent microblog text better than using the space vector model. We can give appropriate weight value based on the importance degree of microblog words.According to the vector space model, we proposed a microblog adaptive topic tracking methods based on K-means clustering. In this way, we can track in the real-time microblog corpus according to the microblogs given by the user. We judge whether the microblog belongs to the topic according to the similarity between microblog vector and the subtopic vector set. Meanwhile, adjust subtopic vector set dynamically. Specific method is as follows. When the microblog is judged as a member of the topic, the candidate will be selected and the word frequency will be calculated. If the word frequency exceeds the threshold value, we concider that the new sub-topic emergence, and cluster the microblog tracked by K-means clustering, then, adjust the subtopic vector set according to the results. Thus, subtopic vector set can be adjusted with the tracked microblog dynamically, so that we can track the topic more precisely. In addition, we also made a study of automatic summarization of microblog. First, we cluster microblogs using sub-topic vector set as the initial cluster centers. Then calculate weight of sentence, select the biggest weight in each category as abstract sentence. Finally, sort these sentences in chronological order to give the final topic abstract.This work has been supported by the National Natural Science Foundation of China under Grant61172072,61271308, and Beijing Natural Science Foundation under Grant4112045, and the Research Fund for the Doctoral Program of Higher Education of China under Grant W11C100030, the Beijing Science and Technology Program under Grant Z121100000312024, and Beijing Municipal Commission of Education Discipline Construction and Graduate Construction Project.
Keywords/Search Tags:Microblog, Topic tracking, K-means, Adaptive, VSM
PDF Full Text Request
Related items