Font Size: a A A

Online New Event Detection Based On The Elements Of News

Posted on:2014-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y N LiFull Text:PDF
GTID:2248330395477615Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapidly development of Internet, online-news has become an important means for people to obtain information. However, online-news is disorderly and grows rapidly; it is difficult to get the useful and latest information from the large information base. The object of this study, New Event Detection (NED), is one of the tasks of TDT; its task is to detect new event from the chronological news reports.The paper is concerned with the effects of four elements of news,"when","who""where" and "how". Place, people and content are adopted to measure the similarity between reports and events. It also studies the support vector machine and discusses its usage in NED. The paper proposes a new online NED method based on the elements of news reports:First, it builds a new model for events and reports, the included elements are time, place, people and content, the advantage of multiple elements lies in distinguishing similar event.Second, in order to prevent the drift of the center of the event, we apply dynamical updating algorithm of event center so that the event centers update with the join of reports.Third, it provides different similarity algorithms for each element, including the geographical ontology-based place similarity algorithm and the Wikipedia-based semantic similarity algorithm.Forth, in order to find out the importance of each element, an SVM-based training method is used.Five, the algorithm is based on the single-pass clustering algorithm and the slipped time window is used to minimize the time complexity of the algorithm.Finally, it designs an experiment; miss rate, false alarm rate, Cdet and times are carried on to evaluate the results. It outperforms the two benchmark algorithms. When the vector sizes of reports and events are50and200respectively, the centers of events are calculated, and the time window is4, the algorithm gets the best results. Although the method based on adjusting parameters manually can obtain good performance, it needs repeatedly adjusting of parameters and hardly to get the best parameters; while the method based on SVM gets better results and almost works in the same speed.
Keywords/Search Tags:new event detection, elements of news, single-pass clustering, semantic similarity, SVM
PDF Full Text Request
Related items