Font Size: a A A

The Research In New Event Detection Of News Story's Characteristic Based On SVM

Posted on:2012-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:S X ZhouFull Text:PDF
GTID:2218330368978993Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
News web pages are an important resource for us to get information.Since the Internet provides the most up-to-date information. However, the abundance of this information is overwhelming. In order to solve this problem, news articles should be organized in various ways. Topic detection and Tracking (TDT) is a application research answering to the overwhelming information.Since the idea of TDT was proposed in 1996, Many organizations and institutions participated in TDT research epically in America.New Event Detection (NED) has been a research topic in TDT community for several years.It aims to solve this problem by categorizing news stories according to events. As the application value of TDT in Information Retrieval and text categorization, the relation studies has became an important part of information processing.Basic NED system has these processing blocks:document preprocessing document feature selection, document similarity calculation and window size selection. In this thesis, we give a new NED system based on the research findings both at home and abroad. This thesis addresses Chinese news streams in text mode and make improves on the expression of time noun,the standardization processing of placename,the building of Bank of names.On the other hand, important issues are presented at the beginning of news articles. Based on this observation, we modify the term weighting component of the Okapi similarity measure in several different ways and use them in NED. On the aspect of text categorization, Support Vector Machines (SVM) have been successfully applied in pattern recognition and have received extensive attention in machine learning. It has also been proposed to solve novelty detection problems, whose objective is to detect novel objects from existing instances. New Event Detection (NED)can be treated as one special application of novelty detection. However, the winning technology of NED in the TDT community has remained to be the nearest neighbor method with suitable distance metric in the document vector space.We perform numberous experiments in data set which was collected from xinhuanet and some other websites. This test collection that contains 14295 documents from the entire year of 2009 to 2010 and involves several events in which twenty of them are annotated by humans.In this study, we developed a advanced comprehension function using term positions with several parameters. In this paper we investigated Support Vector Machines and kernel regresion(as a smoothed nearest neighbor method) for the NED task, and compared them to the nearest neighbor method. Our experimental results show that the new document feature selection and the in combination with Okapi improves the effectiveness of a baseline system with the same NN(nearst neighbor)and SVM(Support Vector Machine) methods.The enhancement(13%) of using SVM is the highest one.
Keywords/Search Tags:New Event Detection, Support Vector Machine, Vector Space Model
PDF Full Text Request
Related items