Font Size: a A A

Research On Domain Entity Attribute And Event Extraction Technology

Posted on:2009-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:E B FengFull Text:PDF
GTID:2178360278464333Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
At present, information extraction in natural language processing has become a hot research. The information extracted by IE systems not only can provide for the end user, but also is the first step to build an intelligent query system and a data mining system. Entity-attribute extraction and event extraction in information extraction both provide initial operation for specific applications. Entity-attribute extraction can be applied to entity definition and data mining, while event extraction can be applied to event classification and trace. The self-learning method and the maximum entropy model have been introduced during our work. The dissertation concerns the following aspects:1. Domain character recognition. Domain character extraction is the preparatory work of entity-attribute extraction. In this paper, method based on self-learning is adapted for domain character recognition. First, we use domain lexes as seed words to recognize domain character; second, learn rules according to domain character, third, recognize domain character and domain lexes through the learned rules; lastly, set new domain lexes as new seed words to recognize domain character. The iterations are repeated until there are no new domain lexes. This method has obtained satisfying experimental result.2. Entity-attribute extraction. Entity attribute extraction aims to extract attributes and corresponding attribute values. In the paper, entity-attribute extraction is based on parsing, and realized by combination of rule and statistics method. First, we parse the text after domain character is recognized; second, extract the syntactic chunks that contain attributes and attribute values in parsing trees; lastly, extract attributes and corresponding attribute values from the syntactic chunks.3. Event extraction. In the paper, maximum entropy model is used for event extraction. First, we recognize all the event elements from corpus through methods based on rule and statistics respectively; second, judge whether these event elements are related to the event through the model trained by maximum entropy algorithm. The method has achieved good results.
Keywords/Search Tags:information extraction, entity-attribute extraction, event extraction, hidden markov model, maximum entropy model
PDF Full Text Request
Related items