Font Size: a A A

Research Of Hot Topic Detection In The Internet Public Opinion Information

Posted on:2012-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:J YangFull Text:PDF
GTID:2298330452961706Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of World Wide Web, internet has become a majorchannel for users to obtain and publish information, so internet public opinion playsa growing role in guiding our determination. How to detect and track hot topics ofInternet Public Opinion (IPO) timely and effectively for ensuring informationsecurity and monitoring IPO is a critical and difficult point but also a challenge inthe area of IPO research.In this paper through comprehensive analysis the research status of topicdetecting and tracking, for the corpus of news reports, we use the techniques ofanalyzing and processing IPO to automatically detect hot topics. The core of thispaper includes IPO information acquisition and preprocessing, hot topics detectionand tracking.Firstly, for the two major defects of the traditional Web Spider, the paper useseveral strategies about web page analysis for filtering irrelevant web pages, whichhelps to improve the efficiency of IPO information acquisition. In addition, throughanalyzing web page templates, we present a method of web information extractionbased on template and Regular Expressions to web cleaning and then save the usefulinformation to the database server.Secondly, in the preprocessing of IPO information, through feature extractionwe use vector space model for the title and content of news report representation,improve the feature weight measurement by introduce Named Entity Recognition.Then we devise a formula to compute the Similarity of two different news reports.Thirdly, incremental clustering algorithm single-pass is explored in this article,then, a method called incremental k-means clustering algorithm is given to improvethis algorithm on its shortcomings in topic detection by introducing the concept ofk-means and news report seeds. Through experimental analysis and comparison, itshows that the improved algorithm is effective and feasible in topic detection.Finally, in the processing of hot topic detection, through Characteristics analysisof IPO hot topics, we devise a topic hot degree formula to sort topics based on media attention degree and user attention degree. And then we inquire into backgroundand evolutive process of hot topic by introducing topic index.
Keywords/Search Tags:Internet Public Opinion, Hot Topic Detection, Topicattention degree, Topic Index
PDF Full Text Request
Related items