Font Size: a A A

Research On Related Technologies Of Domain Information Extraction

Posted on:2011-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2178330338479973Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
There are huge amount of documents on the Internet. How to extract the most useful information from the Internet is still a challenge. IE(Information Extraction) techniques address this problem. The main target of IE is transforming the free text into structural or semi-structural information for diversified applications such as question answering and so on.The paper mainly focuses on the three basic tasks of information extraction: named entity recognition, relation extraction, and event extraction. We make research on the relation extraction and event extractions based on named entity recognition, and then implement a complete Chinese domain information extraction system. In this paper, we do our research on information extraction based on rule/pattern-matching method, and use different methods for different tasks to automatically get rules/patterns. The achievements and contributions of this paper are as follows:1. Domain entity recognition. The task of domain entity recognition is to identify the named entity items related to the specific domain and then give them part of speech. In this paper, we identify domain entity using rule matching methods based on domain directive words. We propose a method which can study domain rules from domain entities automatically, and expand domain knowledge from training corpus.2. Domain event extraction. In this paper, we use pattern method to extract event based on named entity recognition. We propose a method to get domain event patterns: First, clustering pattern instances, and then converting every pattern instance to a candidate pattern, at last merging the candidate patterns to get the final event patterns.3. Domain relation extraction. In this paper, we try to extract the changes in staff positions in some special field. We extract domain relations based on pattern matching method, and then use the"pattern-action"template library to analyze and correct the extraction results. The paper uses the Bootstrapping method to get relation patterns giving seed set.
Keywords/Search Tags:information extraction, domain entity recognition, domain event extraction, domain relation extraction, pattern matching
PDF Full Text Request
Related items