Font Size: a A A

A Research On Paraphrase Detection And Generation Based On Sentence Representation

Posted on:2020-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:X X LiuFull Text:PDF
GTID:2428330575452507Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
There has been a growing interest in semantic understanding for NLP since the deep-learning research boost and its advances in basic NLP tasks.Paraphrase,a task to yield various expressions out of same semantic meaning,is one of the many touchstones for natural language understanding.Paraphrase proves great value in various NLP tasks such as machine translation,summary,information retrieval,information extraction and question answering.There are two main topics for paraphrase research,paraphrase detection and gen-eration.Paraphrase detection aims to determine whether two sentences share a same semantic meaning,while paraphrase generation seeks to generate various sentence ex-pressions out of one semantic meaning.However,the complex nature of natural lan-guage brings challenge to researchers,since the vary meaning of a sentence can be twisted based on its context,or expressed in utterly different style.Early approaches in paraphrase research rely on feature engineering originated from specific linguistic rules,which suffer from low-efficiency and bad-scalability.As deep learning proves its efficacy,attentive sequence to sequence(seq2seq)modeling becomes one of the mainstreams in recent paraphrase research.Due to its intuition to construct sequential mapping between exact labels based on training data,Atten-tive seq2seq models hardly meet needs to further produce various expressions.Unlike the existing work,we lie the basis of paraphrase research in constructing deep repre-sentation modeling that is capable of encoding universal semantic features.The deep representation can also be used to generate various expressions root from encoded fea-tures.By far,there has been few exploration based on this intuition,thus we proposed a method using ensemble technique to learn representations suitable for paraphrase de-tection and generation.The work can be summarized as the followings:A stronger baseline representation for paraphrase tasks is proposed.The thesis first summarizes various approaches for representation learning and semantic understand-ing,in which five of them are ensembled as a better representation learning baseline.A paraphrase generation approach based on learnt representation is proposed.Un-like existing works based on specific sequential mapping,the thesis instead fills the gap to generate sequences from semantic representations.Experiments show that the pro-posed method is able to derive diverse expressions from a certain semantic representa-tion.Last but not least,a paraphrase detection and generation system based on learnt semantic representation is proposed.The system can further provide other NLP tasks with text similarity evaluation or paraphrase augmentation.It also visually demon-strates the findings of this thesis and helps deepen the thinking of representation and paraphrase research.
Keywords/Search Tags:Language Understanding, Sentence Representation, Paraphrase Detection, Paraphrase Generation, Ensemble
PDF Full Text Request
Related items