Font Size: a A A

Research And Implementation Of Entity Relation Extraction Based On Generative Adversarial Networks

Posted on:2022-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:K G HeFull Text:PDF
GTID:2518306338970089Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Internet information technology,online media and social accounts have generated a large amount of text data every day,and these text data contain great value.Information Extraction(IE)technology can extract useful information from these texts.This thesis mainly focus on entity relation extraction(Relation Extraction,RE)technology,which is a subtask in information extraction.It aims to extract semantic relations between entity pairs in unstructured text data.Thus,a large amount of unstructured data can be transformed into structured data that is easy to store and analyze.RE is widely used in many artificial intelligence fields such as knowledge graph,automatic question answering and serach engines.RE is mainly based on deep learning technology at present.It mainly includes tasks such as supervised RE and distant supervised RE.Deep learning technology uses the neural network model to automatically extract text features.How to make the neural network model fully excavate the semantic information of the text is a difficult issue in the deep learning technology.Supervised deep learning models rely on large-scale training data,which need lots of time-consuming human annotation.Distant supervision for RE is an efficient method to this issue.By using the remote Knowledge Base(KB),the entity pairs in unstructured text are aligned with the triplet data stored in KB.Thus,large-scale training data can be automatically labeled.However,due to the uncertainty of the relation between entities,distant supervision introduce noisy data.How to suppress noise has become a hard issue that needs to solve.This thesis studies the basic theory and development of RE technology,the status quo at home and abroad,and deep learning technology such as convolution neural networks and generative adversarial networks.In-depth research has carried out on deep learning and noise suppression.The main contents and contributions of this thesis include the following aspects:(1)A Position Attention Long Short-Term Memory Neural Network(PALSTM)based on attention is proposed.Previous deep learning models ignore the importance of entity location in the text for entity relation,which makes it difficult to explore the semantics of the entity context.According to the position of each word in the text relative to the entity,PALSTM assigns different weights to each word based on the linguistic attention mechanism,and obtains the vector representation of the text by assigning attention weight coefficients to the word vector.Experiments show that this method can effectively extract the semantic information of the entity context and the position information of each word.Thus,this method improves the ability of relation extraction and achieves better accuracy than other models in the medium and long distance relation extraction.(2)A Piecewise Convolution Neural Network--Generative Adversarial Network(PGAN)model is proposed.Considering most supervised datasets lack sufficient training data and distant supervision suffers from the noisy data.This thesis proposes a heuristic algorithm,which replaces remote knowledge base with manually labeled training data and automatically labels unlabeled data.The noisy data that come from heuristic labeling are used for adversarial training of PGAN.PGAN is based on PCNN encoder and extracts sentence features through a piecewise convolution network.It mainly consists of a generator and a discriminator.The generator is a perceptron over PCNN encoder which can effectively select valid instances from the automatically labeled data.The discriminator is a classifier over PCNN encoder which can boost resistance to noise and improve the ability of RE.The experiments on the SemEval-2010 Task 8 and NYT dataset show that the PGAN generator effectively filter noise data.The discriminantor is better than other relation extraction models such as PCNN and R-BERT,and the F1 value reaches 89.61%.(3)An entity relationship extraction system based on the above techniques is implemented.Given a piece of text input,the system provides entity relation extraction function.The system mainly includes data layer,model layer and visualization layer.The data layer is based on the graph database Neo4j,which can store the triple data extracted by RE and has the query function.The model layer is composed of PALSTM and PGAN to provide relation extraction algorithm.The visualization layer is based on Django framework,which can visually display the system functions and the query function of triples stored in the database.
Keywords/Search Tags:relation extraction, position attention, long short-term memory neural network, piecewise convolution, generative adversarial network
PDF Full Text Request
Related items