The task of New Event Detection (NED) is to discover the first news report of a news topic’s seed event from the chronological news reports. Because of the rapid development of the Internet, traditional new event detection algorithms have encountered performance bottlenecks as the amount of news reports have increased enormously. With the introduction of cloud computing, performance bottlenecks can be solved by using Hadoop cloud computing technologies to handle large scale dataset. So it is important to develop an efficient NED algorithm suitable for distributed computing environment.The thesis studies the improved NED algorithm by analysis the traditional algorithms’ performance bottlenecks and the strategies to parallelize the algorithm. It then studies the implementation of the improved algorithm using Hadoop distributed platform which handles the large datasets of news reports. The thesis first introduces the background, research significance and status of NED. It then describes the key technologies used in traditional NED and the MapReduce framework. The thesis proposes an improved NED algorithm, it uses inverted index to reduce the time complexity and it is parallelized in some procedures to achieve faster speed. The thesis also designs and implements a distributed NED system based on MapReduce. It finally proves the feasibility and effectiveness of the new algorithm by a set of experiments. |