Font Size: a A A

Research On Application Of Wikipedia In IR4QA System

Posted on:2013-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhouFull Text:PDF
GTID:2248330374480170Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The question answering system is a new generation of intelligent search engine, whichallows users accurate answers to natural language questions, and be able to return to the user.Therefore, compared with the traditional search engines, question answering system to bettermeet the user’s query request, retrieve the user needs to answer more accurately. This article ismainly based on work done in the NTCIR8research is the main part of the problemunderstanding and information retrieval Chinese question answering system, that is IR4QAphase of the study and ultimately achieve the system.Problem understanding stage of research related to natural language interface system, isanswering system started the first phase of this stage of the analysis results on the back of severalstages of processing have a significant impact; information retrieval stage in question answeringsystem in the middle of the implementation phase, the results of its analysis will greatly affectthe quality of the results of the follow-up module. By comparing and analyzing the existingproblems of these two stages in the general Q&A system to identify more effective treatmentmethods used in our system.In this paper, on the basis of previous studies made some of the work are as follows:(1) Collation and analysis of Research of the automatic answering system and search enginetechnology at home and abroad, combined with the strengths of both systems for users in the useof search engines search results jumbled, spent a long time, the results of the accuracy of some ofthe problem Wikipedia used in automatic question answering system, the Wikipedia IR4QAsystem design and realization of the system.(2) Eventually be achieved through the analysis system, in the early stage of system designon the development of a series of practical methods. Based on these methods, while using thehierarchical and modular design concept, to determine the system design principles, and thesystem is divided into the index generation module, problem analysis module, the queryexpansion module, the document retrieval module and documentation rearrangement module.(3) Some of the key technologies involved in the system, the accumulation of theoretical andtechnical difficulties encountered in the implementation process, and propose practical solutions.(4) Focus on the characteristics of the problem, combined with problems in the problem ofclassification, taking into account the huge task of Chinese syntax and semantic analysis toimprove the quality of the system, the system does not have general use in the English questionanswering system inside the machine learning classification methods, but the use of heuristicrules, the wh-word in the identification of problems to work. The questions focused on thesesimple syntax to achieve good recognition effect.(5) Use query expansion method based on Wikipedia to solve the word does not match,including the search of the wiki page, the relevant paragraphs of the location and expansion ofword selection. Experimental comparison show that this method can effectively improve thequality of search results.(6) In order to further improve the accuracy of search results, the system also documentsrearrangement module to use BM25rearrangement of the search results, after rearrangement, thefinal retrieval results.
Keywords/Search Tags:Question Answering System, IR4QA, Query Expansion, Wikipedia, QuestionAnalysis
PDF Full Text Request
Related items