Font Size: a A A

The Study On Question Retrieval Technology In Community Question Answer System

Posted on:2015-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:H T YangFull Text:PDF
GTID:2298330467986695Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The development of Internet technology brings convenience to people’s daily life, and also makes people drown in the ocean of information; people find it is hard to find their own concerned and needed information. With the rapid development of Web2.0, and more and more problems exposed in traditional search engines, such as not giving professional problems effective retrieval and unable to give users an interactive experience. Community Question Answering (CQA) emerged in recent years makes up for these shortcomings to a certain extent, and nowadays provides users a new search experience. In the Community Question Answering system, people can put forward their own questions freely, and answered by other users. Since anyone can ask and answer questions on it, some Community Question Answer systems such as Yahoo! Answers have accumulated a lot of question-answer pairs. How to use these question-answer pairs effectively becomes the focus of many scholars. The study of question retrieval is to use the former question-answer pairs effectively to find the same or similar questions of the user concerned one, so that to short the waiting time of users. However, as mere is a large amount of synonyms, semantic features and syntactic characteristics in natural language, it is not an easy task to find the similar questions in the Community Question Answering systems.This paper mainly focuses on question retrieval, and concentrate to solve the three problems of the questions in the process of retrieval. The first is to solve the questions ambiguity problem caused by lacking of semantic information during the process of retrieval. In natural language, there is a large amount of synonyms, semantic features and syntactic characteristics so that just only depending on the characteristics of the word itself is difficult to solve the retrieval problem of the question. Aiming at this problem, we propose a similarity calculation method of CQA questions based on feature fusion, which mainly uses statistical characteristics, word order features, semantic features, and the answer features related to the question to solve questions retrieval problem.The second is that it will promote the efficiency for problem questions in the process of retrieval. In the section of solving the problem of retrieval efficiency, this paper puts forward a kind of retrieval model which fuses category information of the questions together with category information of the questions’ corresponding answer. This model is mainly considered question category information and the corresponding answer category information, using category information to filter out irrelevant questions, so as to improve the efficiency and performance of the questions retrieve.The last point is the problem caused by misclassification which impacts the retrieval result. Aiming at this problem, this paper puts forward a kind of retrieval model which fuses questions’ topic information together with the topic information for corresponding questions’ answers. This model is mainly considered question topic information and the topic information for corresponding questions’ answer, making use of topic information to merge the similar questions categories, so as to reduce side-effect caused by the misclassification for retrieval results. Finally, with these three models, we experiences separately on the actual annotation data sets from Yahoo! Answers, and through comparative experiments from the multiple perspectives, it shows that, these three models we have mentioned, which respectively focus on their own task, achieved good performance.
Keywords/Search Tags:Community Question Answer, Search Engine, Question Retrieval, QuestionSimilarity
PDF Full Text Request
Related items