Font Size: a A A

Research And System Implementation Of English Text Simplification Based On Seq2Seq

Posted on:2021-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:T Y LiFull Text:PDF
GTID:2428330602975385Subject:Engineering
Abstract/Summary:PDF Full Text Request
Text simplification is an important task in the field of natural language processing.Its goal is to simplify complex sentences into simpler sentences without significantly changing the original semantics of the sentences.Text simplification can reduce the difficulty of reading and make it easier for people to read and understand.Most existing text simplification algorithms focus on English text,and usually treat text simplification as a single-language machine translation task.Because of the success of Seq2Seq model in machine translation algorithm,many text simplification algorithms based on Seq2Seq have been proposed.However,the problems of long-distance dependence,out of vocabulary(OOV)and lack of training corpus are still not well dealt with.In order to better deal with these three problems,this paper proposes a text simplification algorithm based on self-attention mechanism,and proposes an unsupervised method to train the Seq2Seq model.The research content of this paper is as follows:(1)In order to better deal with long-reange dependence problems,a text Simplification algorithm based on self-attention mechanism is proposed The encoder and decoder in this algorithm use feed-forward neural network based on self-attention mechanism instead of recurrent neural network,and have better learning ability for longer sentences.At the same time,a pointer-generator network is introduced on the model's decoder,so that the model can handle OOV words that have not been seen.Experimental results on WikiLarge and WikiSmall English datasets show that the algorithm is superior to the existing text simplification based on Seq2Seq algorithms.(2)In order to solve the dependence of existing Seq2Seq text simplificationalgorithms on labeled datasets,an unsupervised text simplificationalgorithm based on Seq2Seq is proposed First,the existing sentences are scored and sorted using the Flsch Readability Ease to obtain simplified text corpus and complex text corpus,respectively.Then,use Denoising Autoencoder to learn from simplified corpus and complex text corpus,respectively,to obtain the self-encoding of simplified sentences and the self-encoding of complex sentences.Then,the two autoencoders are combined to form an initial text simplificationmodel and a text complication model.Finally,the back-translation strategy is used to transform the unsupervised text simplificationproblem into a supervised problem,and continue to optimize the text simplification model.Experimental results on the English dataset show that this method is better than the existing unsupervised algorithm(3)This paper uses the Flask framework to design an English text simplification web application system,which can directly simplify the sentences entered by the user.The system is divided into three modules:the first module is the system configuration module,which is responsible for the configuration of the text simplified model and the configuration of the web server.The second module is a text preprocessing module,whose steps include text segmentation,data cleaning,and feature extraction.The third module is a text simplification module,which is responsible for simplifying the English text and returning the results to the page for display.
Keywords/Search Tags:TextSimplification, Sequence-to-Sequence, Self-attention Mechanism, Pointer-generator Network, Unsupervised, Autoencoders
PDF Full Text Request
Related items