Font Size: a A A

Research Of Question Expansion Based On Paraphrase

Posted on:2012-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:W P KangFull Text:PDF
GTID:2218330362450478Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Questions Expansion (QE) the technology, which improves retrieval performance of the initial query to make the search results are more satisfied with the user's query intent, by adding more effective words or phrases, or by reconstruction of the seed question. The direct reason for question expansion is the problem of words mismatch, which roots in the flexibility and the diversity of expression of the natural language.Question expansion has two main research aspects. One research aspect is the the construction of expansion resources, and the other research aspect is the exploration of expansion method. In this paper, we try to solve the keywords mismatch problems from the semantic level using the method, which is called question expansion technology based on paraphrase. On the one hand, this paper describes the method about extraction of paraphrase phrases using online translator and dictionaries automatically; on the other hand, the paper explore the methods using paraphrase phrase resources for question expansion, and three new methods based on language model checking were proposed for question expansion using paraphrase resources.The method using multiple online translators and dictionaries for the paraphrase phrases extraction, is regarded as a statistical machine translation process. First translate the source language phrases into some intermediate languages using these mutiple online systems, and then translate the intermediate language phrases back to the source language, and the thanslation model between the phrases in the source languages is established through the intermediate languages in the two processes. The method is very easy to get lots of paraphrase phrases for question expansion, and the average accuracy of paraphrase phrases close to 70%, the average paraphrase phrases count of each phrase is up to 6.In the section of questions analysis, this paper focus on keyword analysis and keyword empowerment. In this paper, a method combining rules and statistical to determine the combination was proposed, as well as a method for keyword empowerment based on statistical methods was studied. Experiments show that the method this paper adopted was better than the rule method for keyword determination and keyword empowerment, and the accuracy was increased by 3 percent. This paper also explored the methods using paraphrase phrase resources for question expansion, and three methods based on language model checking were proposed, which are N-Best synonymous questions expansion algorithm based on language model checking, N-Best synonymous phrases expansion algorithm based on language model checking, and N-Best synonymous phrases improved expansion algorithm based on language model checking. This article whould explain the principle of the expansion algorithms based on language model checking, and whould compare the performance of each method through multiple experiments. The experiments on TREC9 evaluation set showed that: compared to the original query, the measure recall of expaned question using paraphrase phrases was improved nearly by 3%, and N-Best synonymous questions expansion algorithm based on language model checking worked best in the three expansion methods.
Keywords/Search Tags:question expansion, question keywords, paraphrase phrases, language model
PDF Full Text Request
Related items