Font Size: a A A

Design And Implementation Of Chinese Question Answering System Based On Small Search Engine

Posted on:2015-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:G H YinFull Text:PDF
GTID:2268330428484177Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Nowadays, access to information on the Internet is already a thing with daily life,however, Many shortcomings in traditional search engines have a bad impact on the userexperience. For example, search engine based on keywords can not indicates that the user’squery intent, as a result, it returned is a lot of related pages including so much uselessinformation. It is difficult for users to quickly and accurately find the information they need.To overcome these shortcomings in traditional search engine, answering system has beencreated.It allows users to ask questions in natural language form, returned to the user is short,precise answers instead of many related pages.So research about question answering systemhas great significance. However, due to the special nature of Chinese characters andcomplexity of Chinese information processing technology have a greater degree of difficulty,so the Chinese question answering system is not mature enough compared to foreign speaking,therefore it is necessary to carry out in-depth research.On this paper, we dedicated to the design and implementation of a fully functionalsystem with a simple search engine. There are three parts in question answering system,problem analysis, information retrieval and answer extraction. Though search engines havetheir shortcomings, collecting documentation capabilities of search engines is the keytechnology to answering system. Currently, there are many mature large search engines, suchas google, baidu and youdao. But we studied the architecture and implementation methods insearch engine. We build a small search engine in our lab environment.“Small” is not refer toomit functional, but mainly refers to the use web-scale collection and storage space in twoways. Search engine’s design includes the detailed design of three subsystems, they arecollection subsystem, indexing subsystem and retrieval subsystem. The key techniques andalgorithms contained parallel crawling technology, heuristic gathering strategy, mirroringcancellation technology, high efficiency indexing techniques, evaluation strategies related andso on.In this paper, we worked on how to extract information from the relevant web pages anddocuments retrieved in answer. Answer extraction module is one of the most core modulesanswering system. Extraction method directly affects the merits of the question answeringsystem’s performance. Here we take a similarity calculation method based on semanticdependency tree, combining the semantic and syntactic structure to calculate the similaritybetween the questions and the candidate answers sentence, to filter out the answer back to theuser by comparing the level of similarity. In this paper, the experiment using TREC evaluation criteria.The answer of factual questions set Average MRR value extraction0.6936,Defined problem set Average MRR value extraction0.6415. Experiments show that The smallsearch engine based question answering system can do some work, and The answer ofextraction method has a higher MRR value and a higher Accuracy.
Keywords/Search Tags:Question answering system, natural language processing, search engine, answer extraction, similarity calculation
PDF Full Text Request
Related items