Font Size: a A A

Research On Hot Spots Detection Of Netnews

Posted on:2016-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2308330479478388Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Along with the increase of Internet users, Internet becomes one of the most important platforms to pay attention to current affairs and political attention and to give the opinions. Compared with traditional media, network public opinion has become an important form of social public opinion with the more comprehensive content and faster update rate. It’s important to grasp the popular opinion timely by acquainting the network hot topics.This paper is about the research of hot spot found of network news and the main work is as follows:New words mining is the basis of Chinese natural language processing. Different with the Indo-European language, Chinese don’t have special symbols to represent words’ boundary, so any adjacent word has the possibility to combine the words. This article proposes an improved algorithm of association rules to mining new words based on the headlines of network news, input frequent string collections adjacently and orderly. For multiple long term, is paper gives a method of contract the support degree. In hot news mining, the article gives a method of computing the similarity of strings by Mutual Information, hen count the hot degree of the collections of keywords.When selecting news corpus, there is a lot of repetitive between these reports. This paper improved the traditional automatic abstracting algorithm, first use binary classifiers to judge the event sentences in preliminary. Put the event sentences as the candidate set of summary sentences, which reduce the computing time greatly. Due to large amount of data information content, computational efficiency is too low. So for the network news extracted by crawler technology, this paper take summary of news content, select a certain percentage of summary sentences and headlines together as corpus experiment.When dealing with multiple news sites news data,in order to solve the large volumes of data and difficulty in processing. This paper proposes a method that identify every single news site news ranking according to the method described above, and rank the every single news site ranking with top-N algorithm.This paper selects news as corpus from Netease, Sohu and Sina from February 25, 2013 to March 31, 2015. Finally, the experimental results show that our approach efficient.
Keywords/Search Tags:Automatic Abstracting, Association Rules, Mutual Information, Hot Degree, top-N
PDF Full Text Request
Related items