Font Size: a A A

Paraphrase Extraction From Interactive Q&A Community

Posted on:2013-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:W B ZhangFull Text:PDF
GTID:2218330362958748Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Paraphrase, that is, different expressions for the same meaning, is a common phenomenon in natural language. Paraphrasing technology has been applied in many fields of natural language processing, such as machine translation, information extraction, question answering and automatic summarization, but also with lots of defects and difficulties, including hard to extract with high precision, too much noise in source corpus and can not be employed directly. Paraphrase is widely researched in last decade. Most of the researches are focused on acquisition of paraphrase from various language resources and generation of paraphrase. It is a hot topic that how to build large scale of paraphrase corpus, and it is the first step for paraphrase exploration as well.Interactive question answering communities which is a kind of special Q&A platform skipping over natural language understood by computer but just providing a platform for communication among people, have corpus with quick growing rate and sentences in diversified expressions. These advantages provide great value for paraphrase research and extend paraphrase corpus in huge scale. We propose a method on how to extract paraphrase from interactive Q&A communities in this paper. Firstly, we construct a distributed web crawler to fetch corpus in large amounts. Secondly, we demonstrate the feasibility extracting paraphrase from interactive Q&A communities by analyzing features on interactive Q&A communities and deeply study on methods of paraphrasing in recent years. Thirdly, we extract candidate paraphrases by calculating title similarity between two questions. At last, we emphasize on explaining steps of paraphrasing extraction, how to utilize SVM classifier to extract paraphrases from candidates and how to choose features for binary classification of paraphrase/non-paraphrase. In the experiments and their contrast, we analyze the performance of our methods. The results show, the precision, recall and f-measure can reach to 0.7725, 0.7349 and 0.7532 respectively; and the results of further comparing experiments on feature selection show the key features in paraphrase extraction from interactive Q&A communities.
Keywords/Search Tags:Paraphrase Extraction, Interactive Q&A Community, SVM
PDF Full Text Request
Related items