Font Size: a A A

Research On Chinese Event Extraction

Posted on:2009-12-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Y TanFull Text:PDF
GTID:1118360278962054Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information extraction (IE) is a fundamental technique for automatically obtaining information from texts. IE from free texts includes extraction of entities and relationships. However, with the continuous change of the real world, the entities'status and their relationship are varying. And events reflect this kind of changes. Therefore, in order to capture the changes, aiming at the events to extract the relevant information is necessary.Now Event Detection and Recognition (VDR) has been defined as a fundamental task in Automatic Content Extraction (ACE) evaluation plan. For example, the ACE2005 VDR task mainly involves detecting the events of some specified types, and extracting the relative information about these events. The relative information includes event attributes, event arguments and event mentions. Event attributes are event type, subtype, modality, polarity, genericity and tense. According to this definition, the task of VDR includes two subtasks :(1) event detection and classification, (2) recognition of argument roles. Since argument roles are often the entities being involved in an event, named entity recognition is a fundamental subtask for VDR. This thesis studies the three subtasks of VDR, which are expanded named entity recognition, event detection and classification, and recognition of argument roles. Finally, this thesis discusses the confidence estimation for VDR because the accuracy of VDR is not perfect.The following are the main research contents in this thesis:(1) Study on Expanded named entity recognition.In order to alleviate the scarcity of large-scale annotated corpus, the bootstrapping method, one of semi-supervised learning methods, is tried to obtain patterns automatically. And the selection and evaluation of the seeds and examples are discussed in detail. On this basis, this thesis focuses on pattern generalization, and presents two ways of pattern generalization. One is soft patterns, the other is feature vectors. Both of them improve the coverage of patterns and the system performance effectively.(2) Study on event detection and classification. Aiming at the small–scale size and data imbalance in the ACE corpus, this thesis tries to use good feature selection strategy to alleviate the problem that classifier performs poorly on the small and difficult types. An approach to identify Chinese event types is proposed in this thesis which combines a local feature selection and Positive and negative features. The approach fully ensures the performance of each type (especially the small and difficult types). Besides that, this thesis presents an approach to recognize the triggers based on the known event types using ME model. The approach uses the features existed in the positive and negative examples, and uses the two semantic dictionaries of Hownet and CiLin to expand the features.(3) Study on recognition of argument roles.Firstly, an approach using multi-level patterns to identify Chinese event argument roles is proposed. This approach introduces four levels of patterns to fully use the word and dependency grammar information. And patterns of the higher levels are soft patterns, which encompass flexible information and support fuzzy match. And then, this thesis tried to introduce multi-level patterns as the features into the CRF model to identify argument roles. The relative experiments show that the introduction of multi-level pattern into the statistical model can improve the system performance effectively.(4) Study on confidence estimation (CE) for event extraction.Aiming at the imperfect precision of automatic event extraction, two methods of confidence estimation have been discussed. One method is using the system output probability to estimate the confidence; the other is using a separate CE module based on Model. And then the ROC method is used to evaluate the CE results. The relative experimental results show that the strategy of using separate CE module has better evaluation power than that of using the original system output, which can provide more useful information in the system applications.
Keywords/Search Tags:event extraction, event detection and classification, argument role, named entity recognition, confidence estimation
PDF Full Text Request
Related items