Font Size: a A A

Identification Of The Same News Event

Posted on:2018-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:2348330539985817Subject:Master of Engineering - Software Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet,network news has become an important way for people to get news and information.Compared with the traditional news media,network news has more comprehensive content,faster spread speed and wider covering range.Moreover,people can obtain them more convenient so that they can better understand the public opinion accordingly.News event recognition is the basis of analysis of public opinion.Many news may describe an same event,so the recognition of news that describe a same event or similar information are a important basis of hot news analysis.This paper studies the recognition of network news that describe a same event.To achieve a more accurate news analysis,It firstly identifies recognition of network news that describe a same event,then calculates the news concerns.Main work is as follows:1.To design a different web crawler for Sina,Sohu,Netease,Tencent,Phoenix Net these5 different news websites to obtain different website news data effectively,extract the title,text,time and the number of comments of news and store them into the database.2.To carry a cluster analysis on the basis of preprocessing the store data.Firstly,to pre-processing,such as the denoising process,filtering stop words and removing duplicates;then,to carry the cluster analysis of news through K-means algorithm to cluster network news that describe similar same event to recognize those similar news.3.To combine news that describe a same event on the basis of time,place,characters and categories these keywords analysis and TF-IDF weighted algorithm,on the basis of which to calculate overall attention.This paper verifies methods mentioned in this paper by selecting news data of the above five news websites as corpus data.The experimental results verifies the validity of these methods.
Keywords/Search Tags:Web crawler, Clustering analysis, Keywords identification, TF-IDF, algorithm, Same news events
PDF Full Text Request
Related items