Font Size: a A A

Forum Data Based QA Mining

Posted on:2009-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2178360272986756Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Online forums contain a huge amount of valuable user generated content from which we could mine many useful question-answer pairs. For example, they could be used to improve the performance of question-answering system, and augment the knowledge base of chatbot.In this thesis, we address the problem of mining question-answer pairs from forums with a new information extraction method which consists of two critical parts: question detection and answer detection. We propose a LSP (Labeled Sequential Patterns) based classification method to detect questions in a forum thread. This method behaves fairly well both in precision and recall. The graph-based ranking methods have shown to be effective in information retrieval. Inspired by these methods, we propose a graph-based propagation method to detect answers for questions in the same thread. We build a weighted directed graph to denote the relationship of the candidate answers. The weight for edge is computed by a linear interpolation of many factors including similarity of the candidate answers, distance of answer from question, authority of the candidate answer's author. To propagate authority, we have two approaches: propagation with and without initial score. We also try different approaches to integrate our methods with the IR models.Lots of experiments are carried out on small scale forum data, which is annotated by hands. The results show that our question detection method is superior to present methods both in precision and recall, and also prove that our answer detection method outperforms others in all the measures including MRR and MAP. Then, we apply these methods to raw large scale forum data, and the sampling investigation result shows that these techniques are very promising.
Keywords/Search Tags:forum data, QA mining, information extraction, labeled sequential patterns, graph based ranking
PDF Full Text Request
Related items