Font Size: a A A

A Study On Deep Neural Network-based Text Generation Method

Posted on:2020-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:X W ChenFull Text:PDF
GTID:2428330575466291Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recently,the blossom of deep learning has facilitated the development of text generation.As the typical tasks of text generation,automatic text summarization and automatic text simplification have drawn on intense research interests.Automatic text summarization and automatic text simplification aim at extracting the core content of the source text and generating a version easier to read and understand,and are efficient methods of dealing with information overload and reading disabilities.At present,most mainstream neural network methods adopt sole recurrent neural network as the encoder or decoder.Such practice tends to cause issues including poor text presentation for the source text,low semantic relevance between the generated sentence and the source text,word redundancy,and the out-of-the-vocabulary(OOV)problem.To address the aforementioned issues,this dissertation aims to study the text generation method based on deep neural network.There are mainly two text generation tasks including text summarization and text simplification.The main contributions are listed below.(?)We design and implement an abstractive neural text summarization model.First,we devise a novel type of hybrid encoder,which leverages the global and local contextual features by combining the structure of both recurrent neural network(RNN)and convolutional neural network(CNN)to learn a joint representation for the source text,thus generating text representation of higher quality.Besides,we introduce two modified diverse beam search,one of which features the language model and the grammatical soundness in the scoring function.The other beam search aims to foster the diversity in the generated sentences during decoding by introducing a penalty term.Moreover,we propose a keyphrase reranking mechanism,which ranks the hypothesis sentences generated from beam search according to its saliency score which measures the co-occurrence of keyphrases with the source text.Such reranking mechanism promotes the semantical relevance between the source text and the generated sentence.We conduct experiments on various datasets such as CNN/Daily Mail.The results on both tasks show that our proposed model contributes to promising improvement in performance compared with the state-of-the-art baselines.(?)We propose a subword units-based end-to-end way of sentence simplification model.Aiming at addressing the problems of rare word training and OOV word generation in text simplification,a subword unit extraction method based on byte pair encoding algorithm is proposed.This method divides the text and extracts the subword units to construct the vocabulary,which can effectively reduce the vocabulary scale,thus increasing the efficiency of the sequence-to-sequence model.At the same time,the method effectively associates words with similar morphology and can cover more rare words and OOV words.We apply the model to different datasets including PWKP and WikNet.The experimental results show that the model has significantly outperformed the word-level methods.
Keywords/Search Tags:Automatic Text Summarization, Automatic Text Simplification, Text Representation, Beam Search, Subword Units, Deep Neural Network
PDF Full Text Request
Related items