Font Size: a A A

Research On Controllable Paraphrase Generation

Posted on:2022-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y S WangFull Text:PDF
GTID:2518306563977789Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Paraphrase refers to the text with various words or sentence forms to express the same meaning.Paraphrase generation is to generate paraphrase sentences with the same meaning but different expressions for a given sentence,which is widely used in the natural language processing tasks.In recent years,with the development of paraphrase generation technology,the demand for generating paraphrase sentences according to the specified syntax has emerged.Controllable paraphrase generation as a new topic emerges,which becomes hot research.Controllable paraphrase generation is to generate a paraphrase sentence according to a given syntactic exemplar sentence,so that the paraphrase sentence is the same syntax as the syntactic exemplar sentence,while preserving the meaning of input sentence.The existing works show a good performance on the syntactic similarity with the syntactic exemplar sentence,but have the problem of a reduction of the meaning preservation of the input sentence.To solve this problem,this thesis proposes a controllable paraphrase generation model based on hierarchical syntactic parsing to improve the meaning preservation.On the other hand,the research of controllable paraphrase requires large-scale parallel paraphrase corpus with syntax change,but the existing construction methods of paraphrase parallel corpus cannot meet this demand.In this thesis,a method of constructing paraphrase parallel corpus with diversified syntactical forms is proposed,and the constructed corpus is applied to other tasks to verify the validity of our method.The main research contents and contributions of this thesis include the following two aspects.(1)This thesis proposes a model of controllable paraphrase generation based on hierarchical syntactic parsing.Controllable paraphrase generation is based on the encoder-decoder paraphrase generation framework.The framework adds a syntactic exemplar sentence as a new input.This thesis researches how to encode the syntax information of a given syntactic exemplar sentence to guide the paraphrase generation in decoding.In our experiments,we found that the existing work encoding the bottom layer of the constituency-based parse tree of syntactic exemplar sentence.Because the bottom layer information has too strong constraints on the number of words and the part of speech of the generated paraphrase sentence,the problem of abrupt ending is caused,which leads to the reduction of the meaning preservation.In order to solve this problem,this thesis proposes a method of encoding and decoding syntactic exemplar sentence based on hierarchical syntactic parsing.In the encoder,this thesis encodes the abstract syntax parse tree in higher level of full parse tree.In the decoder,this thesis adjusts the number of generated words for ensuring the corresponding meanings to be fully reflected in the generated paraphrase sentence by adding control vector to the parse tree node at the decoder.This thesis trains the model on the public English dataset.The experimental results show that,compared with the best models,the performance of our model is improved by 0.87%,1.33%,0.09%,0.94% and 0.66% respectively on BLEU,ROUGE-1,ROUGE-2,ROUGE-L and METEOR,those metrics of the meaning preservation,and improved by 0.56% and 1.79% on TED-R and TED-E,those metrics of syntax similarity.This thesis further evaluates the improvement effect of abrupt ending problem.The results show that 92% of the problem are improved.The evaluation results show that the proposed method can effectively improve the problem of abrupt ending.At the same time,the similarity of syntax is improved,and the meaning preservation is greatly improved.(2)This thesis proposes a method of constructing paraphrase parallel corpus with diversified syntactical forms and verifies the constructed corpus in the application tasks.To solve the problem that the training of controllable paraphrase generation model lacking data with diversified syntactical forms,this thesis constructs a paraphrase parallel corpus by using multiple translation engines and automatic evaluation methods of machine translation.Specifically,this thesis proposes a method to construct the Chinese paraphrase corpus from English paraphrase corpus by selecting several high-quality translation engines,then designs an algorithm based on meaning similarity and syntax diversity to select translation sentences pairs.This thesis obtains 115 k pairs of parallel Chinese paraphrase corpus from the Quora English paraphrase corpus.This thesis verifies the constructed corpus on paraphrase recognition task and natural language inference task,compares the use of a single translation engine with multiple translation engines,and evaluates the algorithm.The experimental results show that,compared with the baseline model,the paraphrase recognition model and inference model trained by the constructed corpus are improved by 8.59% and 2.64% respectively.The results show that the proposed method can construct paraphrase pairs with diversified syntactical forms to enhance the robustness of the downstream task models.In addition,the evaluation results of small sample task experiments show that the constructed corpus has data enhancement effect for the task of low resource scenarios.In the research of controllable paraphrase generation,this thesis proposes a controllable paraphrase generation model based on hierarchical syntactic parsing.The evaluation experiments on public datasets show that the proposed method has higher performance than the existing methods.Aiming at the research of controllable paraphrase generation,this thesis proposes a method of constructing paraphrase parallel corpus with diversified syntactical forms and verifies the effectiveness of the method by comparative experiments.
Keywords/Search Tags:Paraphrase generation, Controllable paraphrase generation, Encoder-Decoder, Syntactic parse tree, Paraphrase parallel corpus, Paraphrase recognition, Natural language inference
PDF Full Text Request
Related items