Font Size: a A A

Research On Extracting Chinese Entity-relationship Based On Maximum Entropy Model

Posted on:2011-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ZhangFull Text:PDF
GTID:2208330332976640Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Entity relation extraction is to find a variety of semantic relations between the entities from a particular field, and stored.it with a structured form.It has a wide range of applications in information retrieval, automatic question answering track, and get more and more attention as a key technology in the field of information extraction. It is different from the entity which primarily used to describe an object or a collection of objects, entity relation is to explore an explicit or implicit semantic links.The performance of entity relation extraction system depends on a number of factors, which including the entity's correct detection, to determine the correct entity type, as well as to determine relation between entities correctly. Usually a complete system of entity realtion extraction should included five modules that are connected in turn:NLP processing, named entity recognition, pattern matching or classification, coreference resolution, as well as processing a new relation and standardized output it.In order to achieve a more complete relation extraction system, this article will mainly be divided into three modules which connected in turn:named entity recognition, coreference resolution, entity relation extraction. The achievements and contribution of this paper is mainly reflected in the following aspects:1) Named entity recognition:As the former work of relations extraction system, entity recognition is an important component of the system. In this paper, we used the machine learning algorithm of conditional random fields to recognize the seven major types of entities:including Persons, Organizations, GEP, Locations, Vehicle, Facilities, Weapon, and got a good result.2) Coreference resolution:As a named entity may be appeared several times in the same sentence of the text, its manifestations may also be in varied forms; therefore, the relation between entities will be repeated detection usually. For these issues exist in the relation extraction, we proposed to extract feature vectors by rules, and to use machine learning algorithm of SVM for training classifier model, with this method to achieve the coreference resolution between entities.3) The entity relation extraction based on maximum entropy model:this part is the main task and key research of this paper. This paper consider the words, part of speech, entity, and the corresponding combined feature to build the feature set for extracting the entity relation, and used the stop word removal technique in the process of feature construction.We used coreference resolution technology to remove the duplicate entity, to avoid detecting repeatedly the relation between entities. When using the maximum entropy model to realize the entity relation automatic extraction, experiments show that the maximum entropy algorithm as compared to the other supervision machine learning algorithm to improve the final result is not very clear. On this basis, verify that the entity's words and part of speech features, stop words, and the combination of features is extremely useful features of the classification results, the final effect is good, eventually achieved good results.4) Demo:The system integrates three connected modules:named entity recognition, coreference resolution, entity relation extraction, to realize the automatic extraction of entyity relation. Finally, we designed three experiments for testing the above three models respectively.
Keywords/Search Tags:Named Entity, Coreference Resolution, Entity Relation Extraction, Information Extraction, Maximum Entropy Model
PDF Full Text Request
Related items