Font Size: a A A

Research And Implementation Of Question Answer System Based On Information Extraction

Posted on:2017-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:G YuFull Text:PDF
GTID:2308330491452347Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The world is in an era of information explosion, the network resources growing by the form of geometric. People can easily obtain information from the Internet through the search engine. However owning to the shortages of traditional search engine only searching by keywords, and the results returned with a bunch of web pages, so traditional search engine has already could not satisfy people demand for information retrieval. Question answering system combines information retrieval and natural language processing technology, through the in the form of natural language questions as input, using natural language processing technology deeply to analysis of user’s retrieval intention, and then according to the intention of user to locate the answer from the knowledge base, finally extracted directly to answer, rather than a pile of related web pages. So the question answering system is a better way to meet the demands of the information retrieval.In this paper key technologies of the question answering system are studied, and implements a question answering system based on the technology of information extraction, the main works being as follows:Firstly, the information extraction engine research. Extraction engine system is divided into two parts of the natural language processing and information extraction. Natural language processing realizes word segmentation, part-of-speech tagging, semantic analysis, etc., and information extraction implement named entity recognition and extraction of entity relationship. Named entities and entity relationship are very important in question analysis and answer extraction. Information extraction engine follows with question answering system always.Secondly, problem analysis. In this paper, by keywords extraction, named entity recognition, information extraction technology etc., the problems are divided into three categories:entity relational, entity and keywords. The questions of entity relational and question of entity, can be divided into more detailed types according to the specific categories. Remove words in stop list and expand keywords by synonym dictionary.Thirdly, answer extraction. In this paper, back-off strategy is proposed and in accordance to relational entity type,entity type to obtain the phrases and sentence answer set, obtain the best answer using by basic characteristics (the same keyword frequency, keyword spacing, the longest string matching), named entities, entity relationship. And to the problem of entity relational, the answers are obtained by matching the question and candidate answers in relationship of triples.Fourthly, system implementation. Question answering system based on the technology of information extraction is designed and implemented. The system uses Lucene to achieve indexing and searching, and in order to improve rate of indexing and searching it is based on Hadoop platform.
Keywords/Search Tags:question answer system, information extraction, entity relation, named entity, back-off strategy
PDF Full Text Request
Related items