Font Size: a A A

Research Of Burst Topic Detection And Tarcking Based On Microblog

Posted on:2017-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:W LvFull Text:PDF
GTID:2348330491963234Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Topic detection and tracking is a kind of method to find most discussed topic from large scale of data which is used to solve the increasingly serious problem of information explosion. Topic detection and tracking can help people get the hot topic and follow up the latest progress of the topics.Microblog has rapidly developed since 2010. As more and more users use microblog to share information and discuss topics, topic displaying has gradually become an essential feature of microblog platform. Therefore, according to microblog feature, an effective method is proposed to extract information from microblog data stream, detect and track hot topic.1) Method of filtering invalid microblog is proposed through analyzing characteristics of microblogs. Since the complexity of users, analyze the characteristics of users, including user e.g., number of followers, microblogs posted per day, to filter advertisers and zombie users. Besides, microblog content analyzing is used to filter large number of junk information, e.g., commercial promotion, sharing, and check-in information. After segmenting microblog and deleting stop words and special characters, the blogs with large amount of words or few word are also filtered.2) The time-characteristics-based method is designed and implemented to detect hot topic. The microblogs are sorted by time increments after processed. By improving the Single-Pass clustering algorithms, including improved similarity calculation method, which is combined with improved topic vector update method by exploit user influence, this thesis preliminary detect hot topic from large scale of microblog data. FP-Growth frequent item-set discovery algorithm is used for mining frequent feature word set to fix error in SP algorithm. The improved K-MEDOIDS algorithm is used to extract the final topic which improves the accuracy and computational efficiency of topic detection.3) A method of topic tracking is proposed and implemented by using multi topic query vector based on time characteristics. Based on the microblogs distribution in the time dimension, this thesis group the microblog by time interval, detect topic from each segment in ascending order. Each topic will be compared with all existing topics and will be put into existing topics group or new topics group according to similarity comparison based on the threshold value. Accuracy is significantly increased by using the method while solving the problem of topic drifting.
Keywords/Search Tags:micro-blog, topic detection and tracking, Single-Pass, FP-Growth, K-MEDOIDS
PDF Full Text Request
Related items