A Study On Text Generation Technology Based On Deep Learning

Posted on:2021-03-28

Degree:Master

Type:Thesis

Country:China

Candidate:H M Xu

Full Text:PDF

GTID:2428330620964044

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the development of Internet technology,people face a large amount of text information every day.In order to help users to quickly find the information they need in the explosive Internet environment.In this thesis,we expand the query terms through the paraphrase generation technology to improve the performance of the information retrieval system.At the same time,we use automatic text summary technology to analyze the results returned by the retrieval and extract key information,and generate a more concise text.In this thesis,we explore deep learning-based text generation techniques and conducts study on paraphrasing generation and domain-specific automatic text summary generation tasks.In order to solve the problems existing in the task of paraphrase generation such as the lack of training corpus and the lack of diversity of the generated text,we put forward some solutions.We propose solutions to the problems existing in the task of text summarization generation in specific fields,such as out of vocabulary words,long-distance dependency,and the text structure of the summary does not conform to the characteristics of the field.The main work of this thesis is as follows:(i)Design and implement a sequence to sequence based paraphrase generation model,which consists of a BERT extractor and an LSTM generator.The BERT extractor,with multi-layer bi-directional attentions,can extract deep language features from the input.Besides,the LSTM generator,which is a well-learned language model,is used to generate paraphrases.Since the existing paraphrasing corpus is insufficient and unbalance,paraphrasing models usually struggled to generate fluent language as well as accurate paraphrases.To address the problem,we train our model by combining the feature-based and fine-tuning strategies.At the same time,we propose a context-based paraphrase generation method so that the model can complete textual paraphrases at the chapter level.In the model prediction generation stage,we use diverse beam search generation strategy to replace the traditional beam search generation strategy or greedy sampling generation strategy,so as to help the model generate multiple paraphrasing texts with different expressions.The experimental results on three different granularity datasets show that our training pattern and generation strategy are effective.A well-trained paraphrasing generation model can generate multiple high-quality paraphrasing texts.(ii)A text summarization model based on sequence to sequence independent double encoder is designed and implemented.Compared with the traditional single encoder frame structure,we use an independent encoder to extract the frame information features contained in the existing summary text,and then as additional information to help the model to generate the summary text with a specific pattern frame structure.In order to solve the long-distance relationship dependency problem caused by the long length of the original text in the generative summary task,the whole structure of our model is completely based on the attention mechanism.At the same time,we use byte pair coding to preprocess the dataset,so as to solve the problem of the out of vocabulary words in the summary generation task.The results show that our model can generate high quality summary texts with domain pattern framework.

Keywords/Search Tags:

Paraphrase Generation, Automatic Text Summarization, Feature Extraction, Beam Search, Subword Units

PDF Full Text Request

Related items

1	A Study On Deep Neural Network-based Text Generation Method
2	Research On Automatic Text Summarization Based On Deep Neural Networks
3	Research Of Automatic Summarization Oriented To News Text
4	Research On The Generation Of Automatic Summarization In Chinese From Web
5	Research On Controllable Paraphrase Generation
6	Syntactically Controlled Paraphrase Generation
7	Research And Implementation Of Automatic Summarization For Search Engine
8	Joint Scoring Automatic Text Summarization Generation Based On TextRank Algorithm
9	Research On Statistical Paraphrase Acquisition And Generation
10	Research On Paraphrase Processing Methods Based On Neural Networks