Font Size: a A A

The Study On Paraphrase Generation Based On Neural Network

Posted on:2020-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:B B MaFull Text:PDF
GTID:2428330575994968Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Paraphrase refers to the different expressions of one meaning in the same language,which reflects the flexibility and diversity of natural language,but becomes the bottleneck of natural language processing(NLP).In order to solve this issue,paraphrase generation aims to generate multiple different sentences with same meaning to the given one.Because of its effectiveness of improving the robustness of model,paraphrase generation is widely used in tasks of machine translation,automatic question answering,text summarization,etc.There exist three problems in the study of paraphrase generation:1)the encoder-decoder framework has the generation problems of unknown(UNK)words,low-frequency words and repetition.2)The limited size of paraphrase parallel corpus limits the encoder's semantic representation learning ability,which becomes an obstacle to performance improvement;3)The lack of Chinese paraphrase corpus makes it difficult to carry out research on the generation of Chinese paraphrase.Aiming at solving the problems mentioned above,this paper introduces attention mechanism,copy mechanism,coverage mechanism and multi-task learning framework to solve these problems respectively.The NLP technology is used to construct Chinese paraphrase parallel corpus,and further research on Chinese paraphrase generation is carried out based on the constructed corpus.The main research contents and contributions of this paper include the following three aspects.(1)Design and implement multi-mechanism fused neural paraphrase generation model(namely MMF model).Existing paraphrase generation models improperly process unknown words and low-frequency words,resulting in serious lack of information.In addition,historical decision information was not taken into account during decoding,resulting in repetition of the same words.Therefore,we introduce attention-based copy mechanism and coverage mechanism to construct the MMF model.In this paper,the model is trained on the Quora and MSCOCO corpus respectively to evaluate the contribution of each mechanism.Compared with the baseline model,the Quora experimental results show that MMF model improves by 4.18%,4.25%,4.08%and 3.19%respectively in ROUGE-1,ROUGE-2,BLEU and METEOR.It effectively solves the problems of unknown words,low-frequency words and word repetition,and verifies the effectiveness of MMF model.(2)Propose a paraphrase generation model with joint learning auto-encoding task(namely JLAE model).The limited size of paraphrase parallel corpus limits the semantic representation learning ability of encoder,resulting in poor quality of paraphrase.Thus,we jointly learn paraphrase generation task and auto-encoding task in the multi-task learning framework.The two tasks share one encoder,jointly enhancing the semantic representation learning ability of encoder.In this chapter,MMF model mentioned above is used as a baseline model.Compared with the baseline model,our model improves ROUGE-1,ROUGE-2,BLEU and METEOR by 1.32%,2.04%,1.12%and 0.82%respectively,which verifies the effectiveness of JLAE model.(3)Propose Chinese paraphrase corpus construction method based on multiple translation engines(namely MTE-CPC method).The lack of Chinese paraphrase corpus hinders the development of Chinese paraphrase research.Given abundant English paraphrase resources and mature machine translation technology,we propose MTE-CPC method to construct Chinese paraphrase corpus(namely CPC).And 270k CPC are obtained.Then we carried out a study and summarized 13 kinds of Chinese paraphrase phenomena,3 kinds of which are unique to Chinese.Then experiments of Chinese paraphrase generation are implemented.The experimental results show that ROUGE-1,ROUGE-2,BLEU and METEOR reached 53.59%,27.03%,62.23%and 37.18%respectively on 3-reference evaluation sets,which indicate that the proposed MTE-CPC method is significant to the promotion of Chinese paraphrase research,and proves once again the effectiveness of our two proposed models.In view of the defects of the existing paraphrase generation model,this paper designs MMF model and solves the problems of unknown words,low-frequency words and word repetition.JLAE model is proposed to improve the learning ability of semantic representation.By constructing large-scale Chinese paraphrase parallel corpus,a Chinese paraphrase generation model is built.The model and method proposed in this paper are verified on international open data sets.
Keywords/Search Tags:Paraphrase generation, Multi-mechanism fusion, Multi-task learning, Encoder-Decoder, Auto-encoding, Construction of Chinese paraphrase corpus, Deep neural network
PDF Full Text Request
Related items