Research And System Implementation Of English Text Simplification Based On Seq2Seq

Posted on:2021-03-31

Degree:Master

Type:Thesis

Country:China

Candidate:T Y Li

Full Text:PDF

GTID:2428330602975385

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Text simplification is an important task in the field of natural language processing.Its goal is to simplify complex sentences into simpler sentences without significantly changing the original semantics of the sentences.Text simplification can reduce the difficulty of reading and make it easier for people to read and understand.Most existing text simplification algorithms focus on English text,and usually treat text simplification as a single-language machine translation task.Because of the success of Seq2Seq model in machine translation algorithm,many text simplification algorithms based on Seq2Seq have been proposed.However,the problems of long-distance dependence,out of vocabulary(OOV)and lack of training corpus are still not well dealt with.In order to better deal with these three problems,this paper proposes a text simplification algorithm based on self-attention mechanism,and proposes an unsupervised method to train the Seq2Seq model.The research content of this paper is as follows:(1)In order to better deal with long-reange dependence problems,a text Simplification algorithm based on self-attention mechanism is proposed The encoder and decoder in this algorithm use feed-forward neural network based on self-attention mechanism instead of recurrent neural network,and have better learning ability for longer sentences.At the same time,a pointer-generator network is introduced on the model's decoder,so that the model can handle OOV words that have not been seen.Experimental results on WikiLarge and WikiSmall English datasets show that the algorithm is superior to the existing text simplification based on Seq2Seq algorithms.(2)In order to solve the dependence of existing Seq2Seq text simplificationalgorithms on labeled datasets,an unsupervised text simplificationalgorithm based on Seq2Seq is proposed First,the existing sentences are scored and sorted using the Flsch Readability Ease to obtain simplified text corpus and complex text corpus,respectively.Then,use Denoising Autoencoder to learn from simplified corpus and complex text corpus,respectively,to obtain the self-encoding of simplified sentences and the self-encoding of complex sentences.Then,the two autoencoders are combined to form an initial text simplificationmodel and a text complication model.Finally,the back-translation strategy is used to transform the unsupervised text simplificationproblem into a supervised problem,and continue to optimize the text simplification model.Experimental results on the English dataset show that this method is better than the existing unsupervised algorithm(3)This paper uses the Flask framework to design an English text simplification web application system,which can directly simplify the sentences entered by the user.The system is divided into three modules:the first module is the system configuration module,which is responsible for the configuration of the text simplified model and the configuration of the web server.The second module is a text preprocessing module,whose steps include text segmentation,data cleaning,and feature extraction.The third module is a text simplification module,which is responsible for simplifying the English text and returning the results to the page for display.

Keywords/Search Tags:

TextSimplification, Sequence-to-Sequence, Self-attention Mechanism, Pointer-generator Network, Unsupervised, Autoencoders

PDF Full Text Request

Related items

1	Research On Automatic Summarization Based On Pointer Generator Networks Model
2	Improving Sequence-to-Sequence Research On Chinese Text Summarization Generation Method
3	Abstractive Document Summarization Based On Deep Sequence To Sequence Model
4	Entity-biased Multi-Source Pointer Generator For Article Event Summarization
5	Research On Automatic Generation Method Of English Text Headline
6	Research On Key Techniques Of Two Phase Automatic Summarization Algorithm For Long Text
7	Research On Chinese Abstractive Text Summarization Based On Sequence To Sequence Model
8	Research On Abstractive Summarization Based On Sequence-to-Sequence Neural Network Model
9	Research On Speech Synthesis Algorithm Based On Sequence To Sequence Model
10	Research On Answer Generation Based On Sequence-to-sequence Model