Font Size: a A A

Automatic Text Summarisation And Relation Extraction Based On Deep Learning

Posted on:2022-08-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y LiangFull Text:PDF
GTID:1488306326980499Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of artificial intelligence and big data,text resources have exploded exponentially,making it more and more difficult for people to obtain the information and knowledge they need in time.Therefore,how to quickly obtain the information and knowledge that we are interested in from massive text data has become one of the research hotspots in the field of natural language processing.This dissertation studies two convenient methods to obtain information quickly:automatic text summarization and relation extraction.The purpose of automatic text summarization is to compress and summarize the document content so that the generated summary can express the main idea of the article.Automatic text summarization can be divided into two types:extractive summarization and generative summarization.The former extracts the required sentences or words directly from the original text,while the latter generates the sentences or words that are not all from the original text.The Generative automatic text summarization is more in line with the thinking mode of the human brain,but it also faces greater challenges.First,the generative automatic text summarization tends to encode inaccurately in longer text paragraphs.Second,the generative automatic text summarization will has the problem of word redundancy and sentence repetition.The purpose of relation extraction is to predict the relationship types of a given entity pair in a sentence,and it plays an indispensable role in some important natural language tasks such as large-scale knowledge graph construction,knowledge reasoning,and knowledge questioning.Although some progress has been made in relation extraction in recent years,there are two key problems to be solved.First,most of the existing research works divides the task of relation extraction into two separate sub-tasks:named entity recognition and relation classification,which lack the interaction between entity recognition and relation classification tasks and can not well deal with the problem of relational triples with overlapping entities.Second,most of the current models are limited to sentence level relation extraction.However,the problem of relation extraction between entities contained in the document is more challenging in reality.The model needs to understand the multiple sentences and infers the relationship between two entities by synthesizing the relevant information of the whole document.To solve the above problems,this dissertation carried out research on automatic text summarization and relation extraction based on deep learning,and the main achievements and innovations are as follows:(1)The features of text sequence obtained by the Bi-LSTM encoder are relatively shallow semantic features.With the increase of the length of a text sequence,the traditional Seq2Seq model generates a lot of irrelevant information and incoherent summaries.To solve this problem,the Seq2Seq models based on a selective gate mechanism are proposed,which can effectively filter redundant information and make the generated summaries more coherent and close to the main idea of the text.The traditional Seq2Seq model employs the cross-entropy to optimize the generated summaries sequence and target sequence to make them closer and closer.However,the score criteria of text summarization(ROUGE-1,ROUGE-2,and ROUGE-L)is different from the goal of optimization,and the best optimization result does not mean the highest score.Therefore,the optimization strategy of reinforcement learning is adopted to further improve the performance of the model.The validity of Seq2Seq automatic text summarization algorithm based on selective gate mechanism is verified through experiments:Firstly,in the evaluation of public data sets LCSTS,the automatic text summarization algorithm based on the selective gate mechanism achieved better results than the current dominant SEASS,DRGD and SUPERAE comparison algorithms in general.The model with the added selective gate mechanism was 1.7%(ROUGE-1),1.5%(ROUGE-2),and 1.5%(ROUGE-L)higher than the basic Seq2Seq model.The model with selective gate mechanism and reinforcement learning optimization strategy is 2.6%(ROUGE-1),2.1%(ROUGE-2),and 2.5%(ROUGE-L)higher than the basic Seq2Seq model.Secondly,in order to analyze the ROUGE scores of tweets of different lengths,we divided the tweets into 11 types of lengths.By comparing the ROUGE scores of different lengths of text,the results show that the Seq2Seq model based on selective gate mechanism has better performance than the traditional Seq2Seq model in most of the length of the tweets.Furthermore,we artificially evaluated the generated summaries with the proposed model and the basic Seq2Seq model,and the results show that the summaries generated by the proposed model are more semantically relevant and coherent.(2)The Seq2Seq model based on RNN(LSTM,GRU)and CNN encoder have strong ability of sequence representation,but the sequence structure is a simple structure,and the real text has a more complex graph structure,which makes the traditional Seq2Seq model can not encode the complex graph structure information without destroying the text semantics.To solve these problems,a Gated Graph-based Attention Network(GGNANs)model was proposed,which combined Bi-LSTM and Gated Graph-based Neural Network(GGNNS)to encode text sequences.In order to better mine the graph structure information contained in text sequences,a graph construction method based on PMI,self-linking,forward linking,and backward linking was proposed to better combine the graph structure information with the sequence-structure information.The validity of the automatic text summarization algorithm based on a gated-graph attention network is verified by experiments.Firstly,the model is compared with LCSTS and Gigaword,and the experimental results show that the proposed model is better than the existing strong benchmark model.Secondly,the improved GGNNs network and the GGNNs proposed by Beck and others were analyzed and compared.The experimental results show that the improved GGNNs network is 1.07%(ROUGE-1),0.69%(ROUGE-2),and 1.03%(ROUGE-L)higher than the original GGNNs network proposed by Beck and others.Further,in order to analyze the influence of different sliding windows on the performance of the model,different sizes of sliding windows was set for comparative analysis.The experiment showed that when the size of the sliding window was 20,the model achieved the best performance.Finally,the proposed model was artificially evaluated with the summaries generated by the basic Seq2Seq model.Experiments show that the summaries generated by the proposed model have more semantic relevance and diversity than the summaries generated by the traditional Seq2Seq model.(3)In view of the existing work,most of the relations extraction task is divided into two independent sub-tasks,named entity recognition and relation classification,which lack the interaction between named entity recognition and relation classification in the sentence,and cannot handle the overlapping entity and relationship triples well.To solve these problems,a joint entity and relation extraction model Seq2Seq-RE based on Seq 2Seq is proposed.This model combines the gate graph neural networks with the Seq2Seq attention mechanism model to realize the interaction between named entity recognition and relation classification.The proposed model can better extract the overlapping entity and relational triples.The effectiveness of the Seq2Seq-based joint entity and relation extraction algorithm is verified through several groups of experiments.Firstly,the Seq2Seq-RE model achieves the best performance at present,exceeding the current most advanced methods(WDEC and PDEC)by 1.7%and 0.8%in the F1 score of NYT29 and NYT24,respectively.Secondly,the effects of different edge types on the performance of the model are analyzed.The results show that the model achieves the best performance when retaining syntactically dependent edges,self-linking,and backward-linking.Furthermore,the extracted the entity and relational triples are evaluated manually,and the experimental results show that the Seq2Seq-RE model can accurately extract complex and variable entity and relational triples.Finally,in order to analyze the performance of entity and relational triples with different overlapping types,the test set was divided into normal type,single entity overlapping type and double entity overlapping type according to the types of entity overlapping type.The experimental results show that the Seq2Seq-RE model has better comprehensive performance than the current best WDEC and PDEC models.The F1 values of the Seq2Seq-RE model on the normal type and the overlapping type of two entities are 2.6%and 1.1%higher than those of the WDEC model and PDEC model,respectively.(4)Recurrent neural networks,convolutional neural networks,and graph neural networks have achieved good results in sentence-level relation extraction.However,these methods are currently only limited to the identification of relationships between entities at the sentence level.A more challenging task is the document-level relation extraction,where models need to understand multiple sentences and infer relationships between entities by synthesizing relevant information from the entire document.To solve this problem,a document-level relation extraction algorithm based on depth graph inference,called BERT-GGNNs,is proposed.In this model,a deep gating graph inference network is established by combining the improved gating graph neural network with the established learnable correlation weight matrix,which makes it easier for the model to mine the relationship between entities hidden in the document.In order to obtain deep document semantic information,the Bert model is adopted in the coding module to further improve the performance of the model.The effectiveness of the document level relation extraction algorithm based on depth graph inference is verified by experiments.Firstly,the performance of the BERT-GGNNs algorithm is the best,and the Arg F1 and F1 scores of the document relation extraction data set DocRED are 0.3%and 0.3%higher than the most advanced method(BERT-LSR),respectively.Secondly,the entity and relational triples extracted by the proposed model are evaluated manually.Experiments show that the BERT-GGNNs model can accurately extract entity and relational triples across multiple sentences.(5)An automatic text summarization and relation extraction system based on deep learning is realized by integrating the automatic text summarization algorithm based on the selective gate mechanism,the automatic text summarization algorithm based on gating graph attention network,the joint entity and relation extraction algorithm based on the Seq2Seq model and the document level relation extraction algorithm based on deep gating chart reasoning.The system is mainly composed of the automatic text summarization module based on selective gate mechanism,automatic text summarization module based on gate-controlled graph neural network,the joint entity and relation extraction module,and document level relation extraction module based on depth graph reasoning.The system with the interface,user experience,and other aspects of reliable evaluation and intuitive display.The effectiveness of the deep learning-based automatic text summarization and relation extraction method is verified by systematic tests.
Keywords/Search Tags:text summarization, seq2seq attention mechanism, joint entity and relation extraction, gating graph reasoning
PDF Full Text Request
Related items