Font Size: a A A

Research On Chinese News Incident Extraction Method

Posted on:2016-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:D H PeiFull Text:PDF
GTID:2208330470970762Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
News event extraction is one of the important research tasks in the field of information extraction, its main goal is to extract event elements and the event-related information contained in the news text. Because news events composed of sub-event in the news text, and there is a certain correlation between the sub-event and event elements, how to obtain the elements of news and related according to analysis the sub-event is very important. Therefore, this paper aimed at the problem of news event element extraction, and focuses on the research of news events information acquisition, news sub-event type recognition and sub-event element extraction, news elements extraction and so on, and mainly achieves the following achievement:1. Extract news information based on template. According to the characteristics of diverse news webpage structure, combined with HtmlUnit crawler and XPath technology to obtaining news information template, decomposition of news page, get news page title, time and text and other important information, to realize the automatic acquisition of news information.2. Sub-event type automatic identification based on the model of support vector machine and sub-event element extraction base on the maximum entropy model. Considering the event trigger words, trigger word and POS features are supporting roles for sub-event type recognition, define the event trigger word list, and judge the sentences if or not contain the trigger word to acquisition candidate event. Use the trigger word and its context fusion features to construction the support vector machine model for sub-event type recognition. Taking into account the trigger word and sentence syntactic as features are supporting roles for event element extraction, definition different types of sub element event template, according to different types of sub-event element template acquisition candidate sub-event element. Use the trigger words and syntactic features to construction the maximum entropy model for sub-event element extraction. Experimental results show that the proposed method can effectively identify sub-event type and sub-event element.3. Proposes a method for news event element extraction base on the fusion sub-event element correlation in the construction of Undirected graph model. Taking into account the relationship between sub-event elements correlations are supporting roles for the news event element extraction, First of all, analysis the relationship between sub-events and sub-event elements. Then, use the sub element event characterization as node; use the relationship between sub-event element as side; construction the undirected graph model for news event elements extraction. Finally, reference the idea of the PageRank algorithm to calculate the weight of the nodes in the diagram, realization of the news event elements extraction. The experimental results show that it can effectively identify the news events elements use the proposed method.4. The design and implementation of Chinese news event extraction prototype system, it provides a research platform for further research Chinese news event extraction.
Keywords/Search Tags:news information acquisition, sub-event type recognition, sub-event elements extraction, news elements extraction, undirected graph model
PDF Full Text Request
Related items