Font Size: a A A

The Research And Implementation Of Automatic Question Answering System Based On Wikipedia

Posted on:2018-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:X D WangFull Text:PDF
GTID:2348330536979920Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of science and tec hnology and the Internet,human have been surrounded by massive digital information.How to find the required information quickly and efficiently in the massive data is a problem that should be solved urgently.Therefore,the automatic question answering(QA)system arises at the historic moment.QA system can receive the user's question in the form of natural language directly,and then return concise and accurate answers to user.The current QA system has the following two problems:(1)The Keyword Matching Method only considers the frequency of keywords in the sentence,regardless of keywords' semantic and the semantic relationship between keywords,so the answers often have nothing to do with the user's questions;(2)it can not provide different answers based on the user's different needs for the abstract level of the answer.This paper designs and implements a n automatic answering system based on Single Document(Single Doc)by using Wikipedia.SingleDoc system can accept two types of question asked by the user,they are "What is A" and "Who/When(interrogative)+ event/fact".A represents a specific domain concept or terminology.At the same time,Single Doc system allows users to input the question and specify a single paper and domain,and then extracts the answers from specified paper.The SingleDoc system analyzes the types of questions firstly.Then,based on the specified single text and domain keywords,it uses the Domain Category Space Extraction algorithm(DCSE)to extract the category and relationships between categories in Wikipedia for constructing the Domain Category Space(DCS).The DCS describes background classification knowledge,in which question involved.Finally,the system uses the semantic distance algorithm in the DCS to obtain the sentences in a text,which can be as the answers.O n the other hand,the SingleDoc system can reorganize the answer sentences according to the user's different needs for the abstract concept level of answer,so that it provides a personalized answer for users.The following is the main work:(1)This paper puts forward the DCSE algorithm.The algorithm extracts the relevant category and relationships between categories in Wikipedia to construct DCS.(2)This paper proposes Single Doc system's answer extraction algorithm based on the DCS.Experiments show that the SingleDoc system has a higher accuracy than the QA system with keyword matching.And for the "What is A" type,the accuracy rate can reach 58%.In answering "When/Who/Where + event/fact" type,the answer's accuracy reaches about 80%.(3)This paper proposes a requirement model that describes concept's abstract level of the answer.It provides personalized answers for meeting the user's abstract level needs of answer.(4)This paper designs and implements QA system based on Web.The system mainly includes the data layer,the interface layer,the application layer and the view layer.And it completes the construction of the DCS and the calculation of the sentences' distance.The system allows the user to input the question in the form of natural language sentence according to question's type templates,single text and domain keywords.It provides two types of answers for users,which are common answers and hierarchy answers.
Keywords/Search Tags:Automatic Q uestion Answering System(QA), Wikipedia, Sentence distance, Domain category space
PDF Full Text Request
Related items