Font Size: a A A

Research On Chinese Entity Relationship Extraction Method Integrating External Knowledge

Posted on:2024-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:J H WangFull Text:PDF
GTID:2568307181454084Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Extracting entity relationship triplets from unstructured text is a basic task in natural language processing,which is of great significance to building knowledge maps.On the one hand,researchers attempt to add beneficial external knowledge to the model to enrich the information expressed in text,while on the other hand,they continue to simplify the model architecture in order to reduce the spatiotemporal complexity of the model operation and improve the effect of entity relationship extraction from both the accuracy and performance of extraction.However,there are still problems such as insufficient utilization of external knowledge and complex model construction,making triplet extraction still a hot topic in the field of natural language processing.Therefore,this article has carried out the research on the extraction method of Chinese entity relationship triplets by integrating external knowledge.The main work is as follows:(1)Chinese entity relationship extraction mostly processes text using character sequences,which has problems such as the insufficient semantic representation of characters and semantic forgetting of long character sequences.Therefore,a relational-oriented extraction method integrating dependency syntax information is proposed.The input layer takes character sequences and word sequences based on synonym representation as inputs;The encoding end uses a long short term memory network to encode the text,and adds global dependency information to generate a representation of relationship gates;The decoding end adds dependency type information,and under the action of relationship gates,decodes the entity relationship triplet using a bidirectional long short term memory network.The F1 values of this method on the Chinese dataset of San Wen,Fin RE,Du IE,and IPRE are 5.84%,2.11%,2.69%,and 0.39% higher than the baseline method,respectively.The ablation experiments show that the proposed global dependency information and dependency type information representation methods can improve the extraction performance,and the extraction performance for long sentences and remote entities is also stable and superior to the baseline method.(2)Existing joint extraction models ignore the strong correlation between the parts of a triplet.Therefore,based on the previous work,an entity relationship triplet classification method integrating pinyin and glyph information is proposed.The specific process of this method is that the text is first processed by the Chinese pre-training model,Chinese BERT,and the output text vector contains rich context,glyph,and pinyin information.Then,the current triplet is confirmed to be correct by the triplet classification algorithm SETC.During this process,the parameters in the model are continuously modified by comparing with the triplet tags marked in advance.Compared with method(1),the method in this thesis improves by 0.58%,0.51%,0.69%,and 0.26%,respectively.Relevant experiments show that the proposed triple classification method not only obtains rich interactions among triples but also maintains the advantages of high computational efficiency and easy training of the model.Overall,the research in this thesis is based on integrating external knowledge from different perspectives to achieve the extraction of triples.The next step will be to consider integrating other types of external knowledge and selecting a more portable and highperformance extraction model.
Keywords/Search Tags:entity relationship extraction, external knowledge, relationship orientation, triplet classification
PDF Full Text Request
Related items