Font Size: a A A

Research On Chinese Event Extraction

Posted on:2016-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y GaoFull Text:PDF
GTID:2308330482979199Subject:Military Intelligence
Abstract/Summary:PDF Full Text Request
With the rise of Web2.0 and rapid development of Internet, people can get information faster and more convenient on the Internet. But at the same time, people are plagued by massive information growing explosively. How to extract useful information which users truly need from the mass of unstructured texts has become an important issue. Information extraction technology was born at the right moment. Event extraction is an important research direction of information extraction, and it can help users detect interesting events from unstructured texts and further extract their elements such as character, place and time, which are saved in structural forms. These information available are of great significance for users’ direct view, machine translation, text retrieval, automatic text summarization, trend analysis and a variety of other areas. This dissertation studies Chinese event extraction technology, including three parts which are temporal expression recognition, event detection and classification, and event argument roles extraction. The main findings are listed as follows:(1) Chinese temporal expression recognition is researched. A Chinese temporal expression recognition method based on optimization of dictionary features and dependency relation is proposed. Firstly, the traditional temporal dictionary features are optimized, and the temporal dictionary is divided into the temporal word dictionary and the temporal unit dictionary, which overcomes long-distance-dependence in temporal expressions. Secondly, dependent features are extracted on the basis of optimized dictionary features to mining structural information of temporal expression. Finally, by integrating basic features, dictionary features and dependent features, temporal expression recognition is completed based on conditional random fields. Experimental results on the ACE2005 Chinese corpus and Temp Eval-2 Chinese corpus show that the proposed method is better than traditional machine learning based methods in precision ratio and recall ratio.(2) Event detection and classification is explored. An event trigger extraction method based on dependency parsing and classifier fusion is presented. Through taking full use of event element information and dependency syntactic information, pairs of trigger and entity mention are extracted to improve the recall ratio. To avoid decline of precision ratio, the extraction results based on pairs of trigger and entity mention and those of single trigger extraction are fused. Experimental results on the ACE2005 Chinese corpus show that the new method is superior to single trigger extraction method in event detection and classification tasks.(3) Event argument extraction is studied. Traditional argument extraction algorithms based on machine learning usually change syntactic information into planar features and cannot make full use of syntactic information. To solve this problem, an event argument extraction method based on convolution tree kernel is put forward. Firstly, basic tree structure is constructed to change the syntactic information into structural features. Secondly, since the syntactic trees contain much redundant information, a clipping algorithm is designed to optimize the tree structure and reduce the computing time of convolution tree kernel. Finally, through constructing a compound kernel which combines the planar features with the structural features, a SVM classifier is trained, by which event argument extraction is carried out. Experimental results on the ACE2005 Chinese corpus show that compared with the traditional method, the new method significantly improves the performance of event argument extraction.
Keywords/Search Tags:temporal expression recognition, event extraction, trigger, event argument role, dependency parsing, convolution tree kernel
PDF Full Text Request
Related items