Font Size: a A A

Research Of Relevant Document Retrieval Technology For Question Answering System

Posted on:2010-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:D Y LiFull Text:PDF
GTID:2178360272485270Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The analysis for Question Answer(QA) system shows that the relevance of documents as the source of answer extraction is the main factor which restricts the QA performance. Relevant document retrieval is a main component in QA system, and the relevance of its searching results with the question will directly affect the effectiveness of the answer extraction. Relevant document retrieval includes question classification, query expansion, ranking, etc.In recent years, the research on QA is switched from the factoid questions such as person name location name, etc. to complex questions such as definition question and relation questions. The thesis focuses on the research of query expansion and ranking algorithm for complex questions in QA. Works in this thesis mainly include:Firstly, the existing query expansion method using query log does not consider the authority of URL. Combined with question type, a query expansion method based on query log authority is proposed. Authority factor is considered when calculating relevant degree between the query and web pages, and query length by question type is adjusted dynamically after Local Context Analysis which is used for extracting the relevant terms from relevant web pages. The experimental results show that this method can improve the query performance effectively for biography questions and definition questions.Secondly, for special type questions, documents often appear relevant character-words. By the relation between Entity and Property in HowNet, the pairs of question type and character-words are found to construct question type model.Thirdly, a two-stage ranking algorithm based on question type model is provided. In the first step, multi-strategy aggregation algorithm is used for sorting, and N-best documents are resorted by question type model in the second step. The experimental results show that this method can improve the precision effectively.Finally, a complete relevant document retrieval system is built, which can search documents for a question. Those query expansion method and ranking algorithm are used in the system, and NTCIR-7 testing set is used to evaluate the performance compared with Lucene. The experimental results show that this performance is better than Lucene for complex Chinese questions. Next step, we try to build more precise question type model to resorting.
Keywords/Search Tags:information retrieval, query log, question type model, ranking aggregation
PDF Full Text Request
Related items