Font Size: a A A

Research And Application Of Automatic Text Summarization Model Based On Deep Learning

Posted on:2021-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:S Q ZhangFull Text:PDF
GTID:2428330647960895Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Automatic text summarization is an information compression technology that can automatically convert text or a collection of text into a short summary.An abstractive summarization compresses and summarizes the semantics of the original text according to the original text,the characteristic of which is generating new words is allowed.The development of neural networks in recent years has greatly facilitated the development of abstractive summarization,but there are still problems such as semantic irrelevance,repeated words and inconsistency between evaluation method and loss function in this field.In addition,it is difficult to apply text summary technology in professional field.In order to solve these problems,this thesis proposes corresponding improvement scheme.The basic structure of abstractive summarization concludes encoder and decoder and the main contents of this thesis are as follows:(1)As for the semantic irrelevance,the traditional encoder structure uses bidirectional RNN to encode word vectors or character vectors.Then the vector generated from RNN was input decoder to generate sequence text.This method was lack in dynamic semantic understanding.This thesis uses BERT(Bidirectional Encoder Representation from Transformers)as the encoder to encode the word vector.BERT uses a bidirectional Transformer structure to capture dynamic word or character vectors that constantly change according to different contexts,which represent more complete and accurate semantics.(2)As for the completeness of the summary,OOV(Out-Of-Vocabulary)unregistered words and repeated words,the mainstream copy generation network only copies part of the words in the source document to form a summary.However,the way human generate summary is abstract and general.This thesis proposes a pointer generation network which combines a priori distribution,and introduces segment embedding to compress the source text to guide the copy mechanism of the pointer generation network as an aid of text generation.(3)To solve the problem that the loss function is not unified with its evaluation method ROUGE(Recall-Oriented Understudy for Gisting Evaluation),namely,the loss function cannot optimize the discrete measurement.In this thesis,we introduce a reinforcement learning method that can optimize ROUGE.With this method,we generate a summary that have a higher degree of coincidence with the reference abstract.This method also greatly improves the ROUGE score.The experimental results verify the effectiveness of this method.(4)In view of the difficulty in application of text summary technology,this thesis puts forward a self-made data set in industrial fields and completes the model transfer on it.The experimental results show the validity of the model transfer.In order to further realize the application of text summarization,this thesis develops a text summarization service with the transfer model as the core to summary the customer complaints.This service is integrated with a complaint management module of an enterprise customer relationship management platform in industrial manufacturing.Through the combination of the two,the result shows the application value of the summary technology.
Keywords/Search Tags:Deep learning, Text summarization, Pre-trained model, Prior distribution, Model of migration, Complaint management
PDF Full Text Request
Related items