Font Size: a A A

Adapting Machine Translation Models For Paraphrase Generation

Posted on:2011-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:X LanFull Text:PDF
GTID:2178330338479940Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Paraphrases are different expressions of the same meaning, which is a very common phenomenon in natural languages and concentrated reflect the flexibility and diversity of the human languages. In recent years, with the development of foundation technology of NLP, research on paraphrase has been recently received growing attention. Currently, many researchers have carried out research on paraphrasing and apply it in many NLP fields, such as, Information Retrieval (IR), Question Answering (QA), Information Extraction (IE), Automatic Summarization and Machine Translation (MT), improving the performance of these systems.The research of paraphrasing can be classified as two main directions. One is to acquire paraphrase, which aims at extracting paraphrases of different granularities and forms (such as paraphrase sentences, phrases, and patterns) from various corpora or resources using different methods. The other is to generate paraphrases (which generally means sentence-level paraphrase), that aims to generate paraphrases for the given sentences. In this thesis, we put emphasis on statistical paraphrase generation.Paraphrase generation is important in many NLP applications. By now the research of paraphrase generation is far from enough. We propose a statistical-based paraphrase generation method through the analysis and comparison between paraphrase generation and other relevance research (especially machine translation). This approach has two distinguishing features: (1) it can generate paraphrase sentences for distinct applications with a uniform statistical mode; (2) it can combine multiple paraphrase resources easily to improve the performance of paraphrase generation. However, this method also based on mass and valuable paraphrase resources, which are difficult to acquire. So we propose a method that leverages multiple machine translation (MT) engines for paraphrase generation (PG). Firstly, we use a multi-pivot approach to acquire a set of candidate paraphrases for a source sentence S. Then, we employ two kinds of techniques, namely the selection-based technique and the decoding-based technique, to produce a best paraphrase T for S using the candidates acquired in the first stage. The results show that the method can be easily transformed from one application to another and generate valuable and interesting paraphrases, while the multi-pivot approach is effective for obtaining plenty of valuable candidate paraphrases to improving the performance.
Keywords/Search Tags:application-driven, statistical model, combine multi-resources, multi-pivot approach, paraphrase generation
PDF Full Text Request
Related items