Font Size: a A A

Research On Topic Detection In English News Reports

Posted on:2009-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:K H ZhangFull Text:PDF
GTID:2178360272480385Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As a new area of natural language processing, Topic Detection and Tracking research area covers Natural Language Processing, Information Retrieval, Information Filtering, Artificial Intelligence and Machine Learning. It is a challenge as an intercrossing subject. Topic Detection and Tracking (TDT) research aims at organizing and using information based on the events, and is also a application research answering to the over loading of information. TDT refers to the automatic techniques for detecting new topics and threading together topically related material in news streams of data such as newswire and broadcast news. Since 1996 when the idea of TDT is proposed, many research organizations both at home and at broad participated in it.This thesis addresses English news streams in text mode, and presents solutions to the topic detection task of TDT. Firstly, we introduce some related concept of topic detection and tracking and the development tendency. Thence, we particularly introduce the System Similarity Model (SSM) and the method of similarity calculation. We analyze the action of term variants and abbreviations on TDT research and present an efficient and novel recognition method to discover morphologically related term variants and abbreviations.This thesis presents solutions to the difficulties in topic detection research, which include the difficult-to-distinguish problem and propose a topic detection method based on semantic division according to some properties of English stories in writing and content. The results of experiments show that the method is effective to the problem of difficult-to-distinguish. Finally, analyzing the temporal properties of topic, we propose a dynamic threshold model based on the duration of topic, and explore a ratio method to select the stories which is the most similar to the topic. The results prove that the method significantly improve the performance of system.
Keywords/Search Tags:topic detection and tracking, topic detection, System Similarity Model (SSM), Semantic division
PDF Full Text Request
Related items