Research On Hot Spots Detection Of Netnews

Posted on:2016-02-08

Degree:Master

Type:Thesis

Country:China

Candidate:X Wang

Full Text:PDF

GTID:2308330479478388

Subject:Computer technology

Abstract/Summary:

Along with the increase of Internet users, Internet becomes one of the most important platforms to pay attention to current affairs and political attention and to give the opinions. Compared with traditional media, network public opinion has become an important form of social public opinion with the more comprehensive content and faster update rate. Itâ€™s important to grasp the popular opinion timely by acquainting the network hot topics.This paper is about the research of hot spot found of network news and the main work is as follows:New words mining is the basis of Chinese natural language processing. Different with the Indo-European language, Chinese donâ€™t have special symbols to represent wordsâ€™ boundary, so any adjacent word has the possibility to combine the words. This article proposes an improved algorithm of association rules to mining new words based on the headlines of network news, input frequent string collections adjacently and orderly. For multiple long term, is paper gives a method of contract the support degree. In hot news mining, the article gives a method of computing the similarity of strings by Mutual Information, hen count the hot degree of the collections of keywords.When selecting news corpus, there is a lot of repetitive between these reports. This paper improved the traditional automatic abstracting algorithm, first use binary classifiers to judge the event sentences in preliminary. Put the event sentences as the candidate set of summary sentences, which reduce the computing time greatly. Due to large amount of data information content, computational efficiency is too low. So for the network news extracted by crawler technology, this paper take summary of news content, select a certain percentage of summary sentences and headlines together as corpus experiment.When dealing with multiple news sites news data,in order to solve the large volumes of data and difficulty in processing. This paper proposes a method that identify every single news site news ranking according to the method described above, and rank the every single news site ranking with top-N algorithm.This paper selects news as corpus from Netease, Sohu and Sina from February 25, 2013 to March 31, 2015. Finally, the experimental results show that our approach efficient.

Keywords/Search Tags:

Automatic Abstracting, Association Rules, Mutual Information, Hot Degree, top-N

Related items

1	Research On Technology Of Automatic Text Summarization Based On Multiple Word Co-occurrence And Mutual Information
2	Mining Association Rules Of Micrornas Based On Information Entropy
3	Research On Quantitative Association Rules Model And Algorithm
4	Design And Research Of Personalized Automatic Abstracting
5	The Research On Commodity Association Rules Mining
6	An Association Rules-based Six Degree Separation System: Design And Implementation
7	The Research On The Algorithm Of Mining Quantitative Association Rules
8	Research And Implementation Of Chinese Automatic Abstracting
9	Association Rules Detecting Based On Attribute Topology
10	Research Of The Distance-Based Quantitative Association Rules Algorithms