Font Size: a A A

Research On Domain Adaptive Chinese Entity Relation Extraction

Posted on:2012-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:L F WangFull Text:PDF
GTID:2218330362450415Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid popularization of computers, and the Internet's rapid development, the amount of information is becoming more and more. So, how to quickly and accurately obtain necessary information from the massive data becomes a topic of concern. The main purpose of information extraction is to transform unstructured natural language text into semi-structured or structured data, easy for people to obtain key information quickly and accurately. Relation extraction as one of the subtask and key technology of information extraction, has gradually become an important supporting technique for many natural language processing tasks.Traditional relation extraction methods required pre-defined relation types, and rely on large amount of manually annotated training corpora. So they are difficult to meet the needs of the Internet massive information processing. We propose a new relation extraction research framework to explore the maximum to avoid human intervention, and has a strong domain adaptive capacity, in order to improve the automaticity and enhance protability of relation extraction.First, by analyzing the linguistic phenomenon of the relation instances context, we found the vast majority of the entity pairs which generating some semantic relations could be trigged or described by the general verbs and nouns (referred to as feature words), so this paper proposes the feature words clustering method, which can discover relation types from a certain amount of unlabeled corpus automatically, and can be compared with predefined result with the artificial. Second, for the large number of relation types to be processed, this paper proposes the Web Mining based relation seed extraction method, which can make full use of search engine's large-scale data collection and processing capabilities and advantages, to extract the representative relation core network. The method gets an average precision of 90.91% on selected nine relation types. Next, according to Chinese linguistic characteristics, this paper defines the general context pattern and its generalization, then introduces the bootstrapping method. The method takes the relation core network as input, then iteratively generates the relation description patterns and extracts relation tuples. Through manual evaluation on the sampling relation tuples, the average precision achieves 88.24%, meets the practical needs.Finally, a domain adaptive relation extraction platform named XInfo is designed and implemented, on the platform, researchers can focus on algorithm improvement and research, then make rapid experiment. Also, XInfo can provide support for natural language processing research and applications. In addition, this paper takes the social relations between people as an application task, and develops a online demo system to show relation extraction results in an intuitive and clear way.
Keywords/Search Tags:Relation Extraction, Domain Adaptive, Relation Type Discovery, Relation Seed Extraction, Relation Description Pattern Mining
PDF Full Text Request
Related items