Font Size: a A A

Research And Implementation Of Answer Extraction In Chinese Question Answering System

Posted on:2011-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:B HuangFull Text:PDF
GTID:2178360305955054Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence, it becomes an urgent need of people to use the computer to complete complicated tasks which can only be completed by human wisdom. The computer must be able to have natural language processing in order to fully understand the intention of human beings. The question answering system is a most widely-used application in natural language processing area. Traditional search engines can help users to obtain information from the network. As long as a combination of a few keywords is input, the search engine will return a large number of texts and web pages associated with the information users need, but it is this much of relevant information that bothers users searching, because users also have to make further efforts to find their truly needed information, which wastes a lot of energy, Moreover, natural language processing is still confined to the surface by keyword matching, though fast and easy, and the deeper syntax structure is not involved, and the searching results are not very good, which is difficult to meet the user's needs.The question answering system is proposed under this background, it has overcome the traditional search engine's disadvantages, it is different from inquiry based on the key words, the user can directly give the whole question to the question answering system, the question answering system uses natural language processing technology to analyze the question, gives the accurate and brief answer to the user, not returns the lager of related texts and web pages to the user. The question answering system looks like an expert, may answer any question the user proposed accurately and quickly. At present, the question answering system has already appeared in overseas, it also gets good effects in practical application, but because Chinese syntax structure is very complex, Chinese natural language processing's foundation resources are relatively deficient, so the technology that is applied in the foreign question answering system is not applied in the Chinese question answering system, the Chinese question answering system's research is developing, so there are many tasks to be solved.The Chinese question answering system mainly includes three modules the: problem analysis module, the information retrieval module and the answer extraction module. the problem analysis module's prime task includes the determinate question type, extract question's key words, extend the key words and so on; The information retrieval module uses the question key words engine to inquire through the search engine, it returns to the related documents or the paragraph; The answer extraction module's work is extracting the correct answer from the candidate documents through the search engine. As a question answering system system's core, answer extraction module performance determines question answering system's performance, because the problem analysis module directly decides the answer extraction module's strategy, provides the good technical support for the answer extraction module.This article bases on the above background to research Chinese question answering system's answer extraction, and implements the answer extraction by experiment. As the question analysis module determines the accuracy of answer extraction, processing the question analysis module is necessary.The question analysis module is processed first of all. Question classification is a key work. It directly determines the type of answer and the answer extraction algorithm. We propose a rule-based question classification method by researching the interrogative. if the interrogative's value is positive, we can directly get the question's type by the question classification table. if the interrogative's value is negative, we need to find the noun after the interrogative by the amended rules to judge the question's type, compute the type's number, and then need to query the question classification table to get the question's type. This method is accurate for the fact type of problem. In the answer extraction module, it will use the similarity algorithm, to get the higher similarity, the two sentences must have the same words order. As the questions have the special syntax structure, we must rewrite the question, move the words by interdependent tree, get a new sentence, it has a same order with the answer sentence.Information extraction module is a middle module, it provides the candidate answer, in this paper, we do not research this module, just only use web engine to get some candidate texts.The answer extraction module is the core of the question answering system, and also the core of the research in this paper. Through the analysis of some answer extraction algorithms, we propose a new algorithm by calculating similarity to extract answer. As the Chinese sentence has the special syntax structure, we consider from the relationship between words and semantic,propose a matching algorithm based on interdependent tree, analyze the sentence syntax structure to get interdependent tree, calculate similarity for every layer of the tree, every layer has different weight. We calculate the word similarity in the same layer, the word similarity is calculated by synonyms forest, combine every layer's similarity, we get the sentence similarity. This algorithm has a higher accuracy by experiment.In this paper, we design a complete experiment process, use some existing technology and improved algorithm to realize the answer extraction. firstly, problem analysis module, we use the lexical analyzer to provided by Harbin Institute of Technology ,divide the sentence into words, classify question by a rule-based method, extract keywords, extend keywords by synonym, rewrite question. secondly, information retrieval module, use web engine to get some candidate texts. finally, we use a matching algorithm based on interdependent tree to calculate similarity, return the right answer, analyze the experiment results, our algorithm's accuracy reach 92%. So a matching algorithm based on interdependent tree has a important significance for extracting answer.Of course, there are still many deficiencies in my research, and the research of the Chinese question answering system is developing. So we should continue to perfect our work in the future.
Keywords/Search Tags:Question Answering System, Question Classification, Answer Extraction, Similarity, Interdependent Tree
PDF Full Text Request
Related items