Research On Natural Language Question Answering For Large-scale Multi-domains Knowledge Base

Posted on:2016-02-19

Degree:Master

Type:Thesis

Country:China

Candidate:M Zhu

Full Text:PDF

GTID:2308330461472021

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer networks, we have entered the era of information explosion. As extracting information to meet the needs of users from a vast amount of unstructured information has become a more and more important issue, natural language question answering system emerges. Natural language question answering system is able to make efficient and accurate answering of questions in the form of natural language.The subject of this thesis is natural language question answering for multi-domains large-scale knowledge base. The thesis mainly studies five kinds of natural language questions:the character, mechanism, geography, music and movies. In addition to the construction of knowledge base, it aims to understand the semantics of natural language questions efficiently and accurately. Unlike traditional search engines, natural language question answering system is not doing a simple keyword matching combination, but seeking to understand the user’s intent. Thus, natural language question answering system is facing many difficulties. One is to build a large-scaled knowledge base and an efficient query system. Another is to analyze and understand natural language questions to learn the user’s intent, since the input of the system is the natural language of daily use.To solve these problems, this thesis proposes a series of solutions, mainly in the following four points:First, building the knowledge base. The thesis does a thorough study of knowledge storage model. As the storage model needs to meet the large-scale data storage and efficient queries, as well as to support the features of knowledge reasoning, the thesis makes a decision to use RDF storage model to store data. The thesis builds RDF knowledge base with data from Baike and Douban.Second, studying named entity recognition. Named entity recognition is needed to understand natural language questions. In this thesis, two machine-learning methods are studied for named entity recognition:SVM statistical model and CRF statistical model. At the same time, we conducted research to the feature selection of named entity recognition for each statistical model, understand the influence of different feature templates and different statistical models on the accuracy of named entity recognition, and put forward our own feature selection method and our own statistical model.Third, understanding natural language questions. With the preparation of the questions classified in categories and divided into segmentation, named entity recognition, and the information of POS tagging and category, this thesis puts forward the method of building question’s semantic graph to describe the user’s intention. To solve the problem that an entity may have a variety of Chinese expression, this thesis puts forward the methods of entity disambiguation and property word disambiguation.Fourth, building query mechanism. Since SPARQL search language is needed to search RDF knowledge base, the thesis builds a query mechanism to generate SPARQL queries automatically.In order to verify the efficiency of this subject, the thesis used mobile phone assistant voice questions as experimental dataset, and used datasets of baike and douban to build RDF knowledge bases. Experimental results show that the RDF knowledge base is efficient and easy for knowledge reasoning, the understanding of natural language questions method works well at analyzing and understanding user’s intention, and the natural language question answering system can return answers accurately.

Keywords/Search Tags:

RDF knowledge base, Named entity recognition, Semantic graph, Disambiguation, SPARQL query

PDF Full Text Request

Related items

1	Research On Named Entity Recognition And Disambiguation Based On Network Semantic Resource
2	Research On Graph Based Named Entity Disambiguation
3	Named Entity Disambiguation Based On Chinese And English Wikipedia Knowledge Base
4	Research On Chinese Instruction Parsing For Home Service Robot
5	Chinese Named Entity Recognition And Disambiguation Research
6	Research And Application Of The Chinese Organization Names Recognition And Disambiguation
7	Question Understanding Based On Graph Matching In Question Answering Over Knowledge Base
8	The Task Of Building Fishery Knowledge Base Based On Wikipedia
9	Cross-Lingual Entity Linking And Semantic Query Processing Based On Knowledge Graphs
10	Research On The Construction Of Knowledge Graph Of Segmentation Domain Under The Guidance Of Small-scale Knowledge Base