Font Size: a A A

Online New Event Detection For Large Scale Dataset

Posted on:2015-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y W CaiFull Text:PDF
GTID:2268330425485343Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The task of New Event Detection (NED) is to discover the first news report of a news topic’s seed event from the chronological news reports. Because of the rapid development of the Internet, traditional new event detection algorithms have encountered performance bottlenecks as the amount of news reports have increased enormously. With the introduction of cloud computing, performance bottlenecks can be solved by using Hadoop cloud computing technologies to handle large scale dataset. So it is important to develop an efficient NED algorithm suitable for distributed computing environment.The thesis studies the improved NED algorithm by analysis the traditional algorithms’ performance bottlenecks and the strategies to parallelize the algorithm. It then studies the implementation of the improved algorithm using Hadoop distributed platform which handles the large datasets of news reports. The thesis first introduces the background, research significance and status of NED. It then describes the key technologies used in traditional NED and the MapReduce framework. The thesis proposes an improved NED algorithm, it uses inverted index to reduce the time complexity and it is parallelized in some procedures to achieve faster speed. The thesis also designs and implements a distributed NED system based on MapReduce. It finally proves the feasibility and effectiveness of the new algorithm by a set of experiments.
Keywords/Search Tags:New event detection, Single-pass clustering, Large scale dataset, MapReduce
PDF Full Text Request
Related items