Font Size: a A A

Question Similarity Computation And Classification In Community Question Answer

Posted on:2014-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:D P XiongFull Text:PDF
GTID:2248330395499948Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The development of Internet technology has brought convenient for more and more people’s daily life, which makes people drown in a sea of information. It is difficult to find needed information timely. This is the phenomenon of information overload. With the rapid development of Web2.0, people hope to be able to use natural language to exchange knowledge in a community to get the needed information. As a result, a large number of community question answer systems arise at the historic moment that meets the requirements of the people. On the one hand, in the community question answer, users can ask question and wait for other users to answer or retrieval directly the ask question and then get the answer. This is the problem of question similarity calculation. On the other hand, as time passed, community question answer system has accumulated a lot of QA pair archives which need to be classified correctly to guarantee the robustness of the system. This is the problem of question classification. Therefore, the main works of our paper are as follows:Firstly, while the traditional question answering (QA) systems, such as the TREC QA task, only directly find answers to simple questions and do not suffice to answer real-world questions, and without user interaction, the community-based QA systems (CQA) contain large available QA pair archives which can be used. We propose a new retrieval framework based on LDA topics to solve the similar questions matching problem from the questions of statistical, semantic and theme of information to calculate the similarity between questions. Statistical information is about a VSM-based retrieval model; semantic information is about a WordNet-based retrieval model; theme of information is about the subject of LDA-based retrieval model. Finally the overall similarity is a combination of the three similarities.Secondly, in community-based question answering services, on the one hand, when user submits a question, he doesn’t know about the answer so that he can’t sure the suitable category of the question. The user can post the question without choosing the suitable category. After that, we classify the question using the answer of the question that has been settled. So as to avoid user to randomly tag the question a category that lead to chaos of classification system. On the other hand, when a CQA site becomes unwieldy because of new topic appear leading to inappropriate category setting, it needs to change its classification. We can classify the questions using the answers since the questions have been settled. Therefore, question classification is very important for CQA sites. We propose two methods to solve these problems. Firstly, we present a general classification model, which combines the question classifier and answer classifier using the surface text. Secondly, by the mapping function, we can enrich questions by leveraging answer semantic knowledge to tackle the data sparseness then use the SVM classification.Finally, experiments are carried out on a real-world annotation data set which is sampled from Yahoo! Answers and we have used several evaluation indexes to evaluate the experimental result. The experimental result demonstrates the proposed methods have improvements over traditional methods and good results have been achieved.
Keywords/Search Tags:Community Question Answer, Question Similarity, Question Classification, Machine Learning
PDF Full Text Request
Related items