Font Size: a A A

Research On Key Issues Of Automatic Text Summarization Technology

Posted on:2021-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:J MaFull Text:PDF
GTID:2428330623467775Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,network has become the main source of information.Rich and diverse information resources bring great convenience to people,but massive text information has also caused people a lot of trouble.How to get key information quickly from a large number of text information on the Internet has become a challenge.Compressing and extracting text information using automatic text summarization technology has become an effective way to obtain high-quality text information in the era of information explosion.This paper focuses on the automatic text summarization technology,and mainly focuses on the deep learning based abstractive summarization and dialog text summarization.The mian works of this paper are listed as follows:(1)There are two problems in the Encoder-Decoder based abstractive summarization method,one is the evaluation metric is different from the objective function,the other is exposure bias.Using generative adversarial network(GAN)can solve the above problems,but brings the problems of difficult to optimize discrete data and generate text under conditions.In order to solve these problems,this paper combines the advantages of two methods to introduce adversarial training in the traditional Encoder-Decoder framework.After pre-training the Encoder-Decoder with good performance,it learns and optimizes the encoding of the complete sequence by generative adversarial network.The evaluation metric guides the optimization of the model,and the problems of discrete data processing and conditional generation are avoided.Experiments show that the proposed method can improve the performance of the abstractive summarization model.(2)Due to the lack of large-scale standard data set provided by relevant conference,dialogue text summarization task is difficult to use end-to-end modeling with deep learning models.Compared with the essay text,the full length of the dialogue text is longer,the sentence length is shorter,and the topic is discrete,so the traditional method of essay text summarization can not achieve good results.As for the problem of word outof-vocabulary(OOV),we use the method of named entity recognition to reduce the problem of OOV in dialogue text.As for semantic vector representation,this paper proposes a temporal self-supervised encoder,which can construct a dialog sentence vector with temporal information.To solve the problem of discrete distribution of topics,we can reasonably divide the dialogue text into different topics through self-supervised segmentation model and unsupervised clustering to form a complete dialogue subset.Then,according to the characteristics of dialogue text,we propose two kinds of methods: abstractive summarization and template-based summarization.This method mainly uses unsupervised and self-supervised models for processing,which overcomes the problem of shortage of labeled samples.Through experiments on dialog data sets,the effectiveness of this method is verified.(3)With the above research and work,a prototype of automatic text summarization system based on web is designed and implemented.With simple operations,users can experience the automatic text summarization model in this paper on the web page side.
Keywords/Search Tags:Automatic Text Summarization, Encoder-Decoder, Generative Adversarial Network, Dialogue Text, Self-Supervised Learning
PDF Full Text Request
Related items