Font Size: a A A

Research On The Key Technologies For Web Community Question Answering Retrieval

Posted on:2015-12-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:W ChanFull Text:PDF
GTID:1108330464455429Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Community Question Answering (cQA) is a service where people can communicate to each other for asking questions and seeking advices on the web community. Since cQA services (cQAs) have accumulated large amount of invaluable data generated by human knowledge and experiences, they have become popular alternatives for users seeking information when their queries failed in traditional search engines. In cQAs, people can ask questions in natural language expressions and let the questions be answered by other community participants, and they can also search the question archives to find semantically equivalent or relevant questions and if proper, reuse the corresponding answers. For most non-factoid cQA questions especially those with personal context or for advice suggestion, question retrieval is more effective than the traditional strategy which is just based on retrieving relevant text passages and then extracting candidate answers. Because of this, it becomes one of the most important parts of the intelligent search in the next generation.Recently sparse learning emerges as a new branch of the statistical learning approaches. This doctoral dissertation will introduce the sparse regularization to explore some key technologies in the community question retrieval, specifically, we will carefully study the technology of automatic answer summarization in the complex multi-sentence questions, the hierarchical question topic categorization and the improved methods in question retrieval model. The main work and contributions of the paper are listed below:1. Automatic answer summarization:For the complex multi-sentence question which might contains many sub-questions and corresponding contexts in cQAs, its "best answer" often suffer from the "incomplete answer" problem, i.e., the "best answer" of the question misses valuable information that is contained in other answers. We present a novel answer summarization method to automatically generate a novel and non-redundant community answer summary. The method models the local/non-local sentences interactions in the general Conditional Random Field (CRF) model and unleashes the potential of the abundant cQA features by the group L1 regularization.2. Question topic categorization:when people pose a question to cQAs, he/she is often asked to select a proper hierarchical category label to cover the semantic topic of the question. This would help recommend the question to suitable answerers and facilitate browsing as well as better retrieving of questions from the cQA archives. However, manual labeling demands all users to familiar with the whole category hierarchy enough, thus it costs and undermines the user experiences. To save the efforts of manual classification, we present a hierarchical kernelized classification model for the automatic classification of general questions into their corresponding topic categories in cQAs. We explore and optimally combine various cQA features by introducing multiple kernel learning strategy, and propose a hybrid regularization approach of combining orthogonal constraint and L1 sparseness in our framework to promote the discriminative power on similar topics as well as reducing the model parameters.3. Question retrieval model:to increase the accessibility of the cQA archives, we present a novel method for improving the question retrieval performance by exploring the hierarchical classification process. The existing methods often use the term frequencies in the query as the local importance, which may not work for the case that every query term occurs only once. Different from previous work, we propose a hierarchical question classification method with a sparse regularization to mimc user’s question labeling in cQAs. We select the informative query terms and obtain the corresponding local weights from the classification process. Moreover, we propose a reranking method for smoothing the retrieval scores and further improving the retrieval performance.Most of the proposed methods in this paper utilize the sparse term to regularize the model parameters. The sparse model has the following merits:First, it reduces the number of parameters. Sparse regularization can reduce the number of features it uses, and the training data for fitting the model is also reduced, this would prevent the case of the dense model with too many parameters from over-fitting and generalizes better in the new coming data. Second, it enhances the model efficiency. Because fewer parameters are needed to store, the computing time is also cut down. Third, it will help to find the relation dependencies. The sparse representation can drop the useless feature and keep the ones which are really important for model inference. As a result, the proposed methods are not only effective in question retrieval, but also with significance in other web applications such as long verbose keywords retrieval, web document classification and summarization.We conduct a serial of empirical experiments on datasets from the real cQA site Yahoo! Answers. The experimental results show the effectiveness of our proposed methods compared to some state-of-the-art methods and our strong baselines.
Keywords/Search Tags:Community Question Answering Service, Answer Summarization, Hierarchical Kernelized Categorization, Question Retrieval, Sparse Regularization
PDF Full Text Request
Related items