Font Size: a A A

Hot Topics Detected From Micro-bloggings Based On Word Co-occurrence Model

Posted on:2016-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:L CaoFull Text:PDF
GTID:2308330461994805Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Today is the age of the Internet and the Internet has a huge impact on society. With the rapid development of the Internet, billions of information is produced on the Internet. Weibo services are used by billions of people around the world to get information and express their opinions. Since the birth of the Chinese Weibo platform, a lot of hot topics have been spread by people through Weibo platform. Weibo platform have become one of the most important way to propagate Chinese hot topics. Rumors were spread by some criminals through Weibo platform. Detecting hot topics from Chinese Weibo has vast importance to discovering rumors and guiding public opinion.Chinese Weibo has received much attention recently. Billions of Chinese Weibo are released every day in Weibo platform. There are massive pieces of information in Weibo platform. Since Weibo are full of anomalous writings and must be fewer than 140 characters, the traditional text clustering method is not directly suitable for detecting hot topics from Chinese Weibo.Considering the feature of Chinese Weibo and the characteristics of hot topics propagating, a model is proposed to detect Chinese Weibo hot topics based on word co-occurrence model in this paper. In order to improve the efficiency of the algorithm, we tried to combine word co-occurrence model and “How Net” that is a tool to explain Chinese semantic information. To solve the problem of massive pieces of information on Weibo platform, the Hadoop platform was introduced in this paper. In order to improve the efficiency of the algorithm, TF-IDF algorithm and word co-occurrence algorithm has been ported to the Hadoop platform.Finally, several experiments were designed and the results of experiments was analyzed. The experiment result indicate that hot topics can be efficiently detected by the method that was proposed in this paper.
Keywords/Search Tags:Weibo, hot topic, word co-occurrence, HowNet, Hadoop, TF-IDF
PDF Full Text Request
Related items