Font Size: a A A

Research Of Passage Retrieval For Question Answering Systems

Posted on:2011-01-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:X LiFull Text:PDF
GTID:1118360305466711Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The quick development of the Web has made it a huge information source and an important platform in which people can exchange and share knowledge. For example, users can easily acquire information from the Web with the help of search engines. However, the information in the Web is so huge that it is difficult for users to identify and select valuable information. Hence, how to accurately retrieve and extract the information needed by users has been an important research topic. Question Answering (QA) system has been an important research topic which is the important research direction for next generation search engines. The features of question answering system are:firstly, it allows users to submit a query using natural language question instead of keywords; secondly, the responses to users are concise and exact answer instead of a list of documents. Users can accurately describe their information requirement and QA systems can understand the users'needs and make correct response.Document retrieval module is an important component of the QA system. Usually, the retrieved documents undergoes several computation-intensive procedures including natural language processing, information extraction, and pattern matching, to determine the most likely answers. It could be more efficient if QA systems reduce the size of each document to be processed. For this purpose, the passage retrieval module is added as an intermediate stage between the document retrieval module and answer extraction module. The research issues and our contribution are:1) It analyzes the evaluation methods of document relevance and passage relevance. Because document relevance is mainly density-based lexical relevance, they can not be applied to passage retrieval directly. The thesis discusses the definition of question answering passage retrieval and demonstrates the differences between document retrieval and passage retrieval in the aspects of topic, length and keyword. Based on these differences, some heuristic rules for designing passage retrieval formulas are proposed, which can be more fit to the requirement of QA systems.2) A Web-based question answering passage retrieval method is proposed. The thesis describes the definition of passage retrieval and introduces the basic work-flow and the function of each component. It then proposes a heuristics method to transform the questions to queries for passage retrieval. The keywords are not considered independently but utilize the constraints relations to perform the keywords matching and calculate the relevance score.3) A novel mixture relevance model based on multi-features is proposed. The thesis explores the effectiveness of lexical similarity, topic similarity and structure similarity on passage retrieval. A web-based method of computing similarity between words is proposed and it is utilized to calculate the similarity between a question and a passage. The thesis then proposes a probabilistic topic language model to calculate the similarity between a question and a passage. For structure similarity, two structures which are "wh-movement" and "predicate-argument" are mainly considered. We then integrate the three different similarity metrics into a weighted average metrics for evaluation of the relevance of between a passage and a question.4) A passage retrieval method based on passage-passage graph model is proposed, not only considers the relevancies of passages, but also the relations that have influence on the independent relevancies. The relations between passages can be used to construct the graph model and then relevancies of passages are calculated using the graph model. In view of the diversity of questions, a KNN-based question expansion method is proposed.We use multiple features to calculate the similarity between questions and then obtain the similar questions for the given question. The original question can be expanded based on the similar questions and then the candidate answer passages can be retrieved based on the expanded question model. We construct a passages graph in which the edge between passages is the similarities score between them. The similarities between the answer passages are calculated based on the content features. We then utilize the graph-based model to re-calculate the relevance scores of the answer passages and the ranking parameter is trained using the learning method.
Keywords/Search Tags:Web, Question Answering, Passage Retrieval, Lexical Similarity, Topic Language Model, Structure Similarity, Relevance, Question Similarity, Graph -based Ranking Model
PDF Full Text Request
Related items