Font Size: a A A

The Study On Entity Reation Extraction In Chinese Text

Posted on:2017-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:B KongFull Text:PDF
GTID:2308330503986891Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, more and more texts with valuable information are published online. Attribute to the fact that the relations between entities in text are complex, the manually and empirically acquisition and organization of knowledge has been far to meet the requirement of practical application. Automatically extracting entity relations from the Internet text is becoming an important topic in the research of natural language processing and information extraction. The observation on the existing works shows that most existing relation extraction methods have the problem of coarse classification of target relations. Meanwhile, there are few studies on the complicate entity relation extraction. To this end, this paper investigates the automatic relation extraction method for two kinds of typical complicate entity relations, namely, the relationship between the persons and the relationship between financial entities.The main work in this study includes two components. Firstly, the automatic method for extracting the relation between person entities is investigated. Based on the deep analysis of the text characteristics of the relation of the persons, a person relationship extraction method, which is based on the strategy of single category classifier corresponding to each target specific relation, is designed and implemented. This method determines the true/false of target specific relation by using the features from contextual information and sentence structure. To handle the imbalanced distribution among different relations, a random over-sampling method is applied to reduce the influence of such imbalanced distribution. This method achieves the F-measure value of 0.6751 on the person relation extraction dataset in the 15 th China conference on machine learning contest. Secondly, the automatic method for extracting the relations between entities in financial domain is investigated. Due to the unavailable public annotated relation corpus in finance domain, a specification which defines entity relation framework in finance domain and corresponding relation annotation is developed. Following the annotation guideline, the relations between entities in finance domain text are annotated. A entity relation corpus in finance domain which containing 7 types of relations and 1417 entity relation instances is constructed. Considering the text expression characteristics of the relations between entities in the financial domain, a feature extraction method based on the partitioned bag of words model and rules and proposed. The obtained features are incorporated in a random forest classifier to extract the relations in financial domain. This method achieves F-measure value of 0.6787, which is obviously better than the traditional entity relationship extraction methods. The experimental results show that the partitioned bag of words model extracts the description features for entity relationship effectively. Meanwhile, the random forest model is shown efficient in entity relation extraction attributing to the effective incorporation of contextual description features and rule combination features.
Keywords/Search Tags:relation extraction, character relationship extraction, financial relation
PDF Full Text Request
Related items