Font Size: a A A

Research On Key Techniques Of Question Retrieval For Community Question Answering

Posted on:2015-02-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:W N ZhangFull Text:PDF
GTID:1228330422992629Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the blooming of Web2.0, Community Question Answering (CQA) serviceshave become one of the most important online resources for information and knowledgeseekers. To compare with the Web search engines, CQA services directly return the an-swers of the questions in natural language form, which are submitted by users, rather thanpainstakingly browsing through large ranked lists of results in order to look for the correctanswers. To compare with the traditional Question Answering (QA) systems, the answersin CQA services are generated by real world users. Hence, the quality of these answersis higher than that of the answers which are automatically extracted and generated fromcandidate documents by the traditional QA systems. Meanwhile, a tremendous numberof high quality QA pairs devoted by human intelligence has been accumulated as com-prehensive knowledge bases. The core problem and key technology are to find out thesimilar questions that had already been answered in CQA services. We call it questionretrieval.However, the major challenges of question retrieval in CQA services are questionverboseness when understanding the users’ intent, word mismatch when computing ques-tions’ similarities and ranking questions by only using the text relevance and withoutconsidering their community aspects. Hence, in this study, we address the above threekey problems in question retrieval by considering the following four aspects and thusimprove the whole performance of question retrieval in CQA services.In Section2, we proposed a dependency relation graph based term weighting schemeto solve the word verboseness problem in question retrieval. In detail, one of the majorcommon drawbacks of the term weighting based question retrieval models is that theyoverlook the relations between term pairs when computing their weights. To tackle thisproblem, we proposed a novel term weighting scheme by incorporating the dependencyrelation cues between term pairs. Given a question, we first constructed a dependen-cy graph and computed the relation strength between each term pairs. Next, based onthe dependency relation scores, we refined the initial term weights estimated by conven-tional term weighting approaches. We demonstrated that the updated term weights canbe seamlessly integrated with popular question retrieval models. Comprehensive experi-ments validated our proposed scheme and showed that it achieved promising performance as compared to the state-of-the-art methods.In Section3, we proposed a phrasal paraphrase based question reformulation modelto improve the whole performance of question query expansion. Explicitly, a lexical gapin CQA search, resulted by the variability of languages, is recognized as an importantand widespread phenomenon. To address the problem, we presented a question reformu-lation scheme to enhance the question retrieval model by fully exploring the intelligenceof paraphrase in phrase level. Given a question in natural language, our scheme first de-tected the involved key-phrases by jointly integrating the corpus-dependent knowledgeand question-aware cues. Next, it automatically extracted the paraphrases for each iden-tified key-phrase utilizing multiple online translation engines, and then selected the mostrelevant reformulations from a large group of question rewrites, which is formed by fullpermutation and combination of the generated paraphrases. Extensive evaluations on areal world data set demonstrated that our model is able to characterize the complex ques-tions and achieves promising performance as compared to the state-of-the-art methods.In Section4, we proposed a topic translation and clustering based model to im-plement the term expansion in question query. Concretely, the ranking scheme of thestatistical translation based question retrieval models mainly depends on the translationprobabilities between terms. However, the existing translation based models yield on thenoise generated by the translation model and further impact the question retrieval results.We proposed a topic inference and clustering based translation model for question re-trieval. By leveraging the topic inference information and similarity between topics, wetheoretically verified that it can reasonably control the translation noise and then improvesthe question retrieval results. Experimental results showed that the proposed model sig-nificantly outperformed the state-of-the-art question retrieval models in MAP, MRR andp@1.In Section5, we first proposed a question popularity prediction task and further im-proved the question retrieval results by utilizing the predicted question popularity. Spe-cially, with the blooming of CQA services, a large number of high quality question andanswer pairs are accumulated, which allow users to not only share their knowledge withothers, but also interact with each other. Accordingly, volumes of eforts have been takento utilize the text content for question retrieval in CQA services. While, few literaturemeasures the impact of the user profile and interactive behaviors on question retrieval.Question popularity can reflect the user attention, interest and interactive behaviors in CQA services. Hence, predicting question popularity can further improve users’ experi-ence of searching similar questions. We first analyzed and model the factors that have animpact on question popularity, so that we can predict the popularity for new questions.We then re-ranked the question retrieval result by exploiting the predicted question pop-ularity. Experiment results showed that the popularity based question retrieval results arehigher than that of the text relevance based question retrieval results.
Keywords/Search Tags:community based question answering, question retrieval, questionrefinement, question reformulation, question expansion, question popularityprediction
PDF Full Text Request
Related items