Font Size: a A A

Relation Extraction Based On Fusion Of Corpus-and Sentence-level Features

Posted on:2020-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y D MaFull Text:PDF
GTID:2428330578457148Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the widespread use of the Internet greatly reduces the cost of the information production and information transmission,which brings great convenience to people to get information.We can get variant news from the Internet instead of reading newspapers now.However,this rapid development also causes the subsequent problems,such as information explosion and information flooding,which indirectly bring great challenges to us to get the information we really need.Therefore,how to use the information extraction methods to automatically get the valuable information from amouts of the data on the Internet,is of great significance.Once we can deal with the information automatically,it will be easier to collect information and make decisions.At present,the mainstream methods of information extraction task are mostly based on neural network,which extract sematic features for entity pairs from sentence context,or use extra description information as added features.These state-of-the-art methods may ignore some problems such as the length of sentence is not long enough to contain the abundant contextual message and it may be difficult to get enough external information.To deal with the problems mentioned above,we conduct the idea that the global context information of entities in the entire corpus can be useful to relation extraction task and propose two novel relation extraction methods,which combine the corpus-level entity cooccurrence information and the sentence-level semantic features for relation extraction.Firstly,two new Chinese-language datasets are contributed for the evaluation of relation extraction.Considering the lack of relation extraction dataset and the widely used benchmark datasets are all English-language datasets,we conduct two Chinese-language datasets from the data of encyclopedia website Baidu Baike and news website.Once we have Chinese-language datasets and English-language datasets,we can observe if there are any differences between different language datasets during the model training.Secondly,we propose a model which combines the influence of relationship and the corpus-level features named RASNN model.Considering the mutual influence and restriction between relations,the weak relation in corpus between any entity pairs can be the evidence of relation classification task.We propose the concept of relation influence of entities,which is captured by a network-level attention mechanism in our model,and combine it with sentence-level features to complete the relation extraction task.This model extracts features for entity pairs from perspective of macro and micro context,and overcomes the shortage that a sentence may not contain enough contextual information.Thirdly,we propose a novel model named CNSSNN model to make full use of both corpus-level context features and sentence-level semantic features for relation extraction.In this model,we first build an entity co-occurrence network from the entire corpus.Then we introduce a network-level attention mechanism combined with graph embedding to capture the corpus-level environmental information for entities.Meanwhile,we employ an attention-based Bi-GRU network to extract sentence-level semantic features for entity pairs.Finally,we combine the corpus-level and sentence-level features to classify relations.The experimental results on two manually labeled datasets and two widely used benchmark dataset show that our approach obviously and consistently outperforms other existing approaches in both precision and recall.Finally,a series of comparative experiments and analyses are conducted on two manual labeled datasets and two widely used benchmark datasets.The experimental results show that the RASNN model and CNSSNN model propose in this thesis can effectively and simultaneously capture the sentence-level semantic features from the microscopic perspective and the corpus-level global context information from the macroscopic perspective,and it obviously outperforms other state-of-the-art relation extraction methods.
Keywords/Search Tags:relation extraction, entity co-occurrence network, attention mechanism, relation classification
PDF Full Text Request
Related items