| With the development of Internet technology,people face a large amount of text information every day.In order to help users to quickly find the information they need in the explosive Internet environment.In this thesis,we expand the query terms through the paraphrase generation technology to improve the performance of the information retrieval system.At the same time,we use automatic text summary technology to analyze the results returned by the retrieval and extract key information,and generate a more concise text.In this thesis,we explore deep learning-based text generation techniques and conducts study on paraphrasing generation and domain-specific automatic text summary generation tasks.In order to solve the problems existing in the task of paraphrase generation such as the lack of training corpus and the lack of diversity of the generated text,we put forward some solutions.We propose solutions to the problems existing in the task of text summarization generation in specific fields,such as out of vocabulary words,long-distance dependency,and the text structure of the summary does not conform to the characteristics of the field.The main work of this thesis is as follows:(i)Design and implement a sequence to sequence based paraphrase generation model,which consists of a BERT extractor and an LSTM generator.The BERT extractor,with multi-layer bi-directional attentions,can extract deep language features from the input.Besides,the LSTM generator,which is a well-learned language model,is used to generate paraphrases.Since the existing paraphrasing corpus is insufficient and unbalance,paraphrasing models usually struggled to generate fluent language as well as accurate paraphrases.To address the problem,we train our model by combining the feature-based and fine-tuning strategies.At the same time,we propose a context-based paraphrase generation method so that the model can complete textual paraphrases at the chapter level.In the model prediction generation stage,we use diverse beam search generation strategy to replace the traditional beam search generation strategy or greedy sampling generation strategy,so as to help the model generate multiple paraphrasing texts with different expressions.The experimental results on three different granularity datasets show that our training pattern and generation strategy are effective.A well-trained paraphrasing generation model can generate multiple high-quality paraphrasing texts.(ii)A text summarization model based on sequence to sequence independent double encoder is designed and implemented.Compared with the traditional single encoder frame structure,we use an independent encoder to extract the frame information features contained in the existing summary text,and then as additional information to help the model to generate the summary text with a specific pattern frame structure.In order to solve the long-distance relationship dependency problem caused by the long length of the original text in the generative summary task,the whole structure of our model is completely based on the attention mechanism.At the same time,we use byte pair coding to preprocess the dataset,so as to solve the problem of the out of vocabulary words in the summary generation task.The results show that our model can generate high quality summary texts with domain pattern framework. |