Paraphrase Extraction From Interactive Q&A Community

Posted on:2013-02-02

Degree:Master

Type:Thesis

Country:China

Candidate:W B Zhang

Full Text:PDF

GTID:2218330362958748

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Paraphrase, that is, different expressions for the same meaning, is a common phenomenon in natural language. Paraphrasing technology has been applied in many fields of natural language processing, such as machine translation, information extraction, question answering and automatic summarization, but also with lots of defects and difficulties, including hard to extract with high precision, too much noise in source corpus and can not be employed directly. Paraphrase is widely researched in last decade. Most of the researches are focused on acquisition of paraphrase from various language resources and generation of paraphrase. It is a hot topic that how to build large scale of paraphrase corpus, and it is the first step for paraphrase exploration as well.Interactive question answering communities which is a kind of special Q&A platform skipping over natural language understood by computer but just providing a platform for communication among people, have corpus with quick growing rate and sentences in diversified expressions. These advantages provide great value for paraphrase research and extend paraphrase corpus in huge scale. We propose a method on how to extract paraphrase from interactive Q&A communities in this paper. Firstly, we construct a distributed web crawler to fetch corpus in large amounts. Secondly, we demonstrate the feasibility extracting paraphrase from interactive Q&A communities by analyzing features on interactive Q&A communities and deeply study on methods of paraphrasing in recent years. Thirdly, we extract candidate paraphrases by calculating title similarity between two questions. At last, we emphasize on explaining steps of paraphrasing extraction, how to utilize SVM classifier to extract paraphrases from candidates and how to choose features for binary classification of paraphrase/non-paraphrase. In the experiments and their contrast, we analyze the performance of our methods. The results show, the precision, recall and f-measure can reach to 0.7725, 0.7349 and 0.7532 respectively; and the results of further comparing experiments on feature selection show the key features in paraphrase extraction from interactive Q&A communities.

Keywords/Search Tags:

Paraphrase Extraction, Interactive Q&A Community, SVM

PDF Full Text Request

Related items

1	Research On Statistical Paraphrase Acquisition And Generation
2	Research On Paraphrase Processing Methods Based On Neural Networks
3	Research On Chinese Paraphrase Patterns And Collocations Extraction
4	Research On Controllable Paraphrase Generation
5	Study On Lexical And Phrasal Paraphrase Extraction Based On Context Analysis
6	Research On Relation Extraction Method Based On Paraphrase And Multi-Information Fusion
7	Syntactically Controlled Paraphrase Generation
8	Research On Fine-grained Chinese Paraphrase Extraction Technology Based On Deep Learning
9	Research On The Method Of Automatic Paraphrase Extraction Based On Markov Network Model
10	Interactive Mechanism Of Online Mukbang Community On Bilibili Video Website