Font Size: a A A

Research On Natural Language Question Answering For Large-scale Multi-domains Knowledge Base

Posted on:2016-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhuFull Text:PDF
GTID:2308330461472021Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer networks, we have entered the era of information explosion. As extracting information to meet the needs of users from a vast amount of unstructured information has become a more and more important issue, natural language question answering system emerges. Natural language question answering system is able to make efficient and accurate answering of questions in the form of natural language.The subject of this thesis is natural language question answering for multi-domains large-scale knowledge base. The thesis mainly studies five kinds of natural language questions:the character, mechanism, geography, music and movies. In addition to the construction of knowledge base, it aims to understand the semantics of natural language questions efficiently and accurately. Unlike traditional search engines, natural language question answering system is not doing a simple keyword matching combination, but seeking to understand the user’s intent. Thus, natural language question answering system is facing many difficulties. One is to build a large-scaled knowledge base and an efficient query system. Another is to analyze and understand natural language questions to learn the user’s intent, since the input of the system is the natural language of daily use.To solve these problems, this thesis proposes a series of solutions, mainly in the following four points:First, building the knowledge base. The thesis does a thorough study of knowledge storage model. As the storage model needs to meet the large-scale data storage and efficient queries, as well as to support the features of knowledge reasoning, the thesis makes a decision to use RDF storage model to store data. The thesis builds RDF knowledge base with data from Baike and Douban.Second, studying named entity recognition. Named entity recognition is needed to understand natural language questions. In this thesis, two machine-learning methods are studied for named entity recognition:SVM statistical model and CRF statistical model. At the same time, we conducted research to the feature selection of named entity recognition for each statistical model, understand the influence of different feature templates and different statistical models on the accuracy of named entity recognition, and put forward our own feature selection method and our own statistical model.Third, understanding natural language questions. With the preparation of the questions classified in categories and divided into segmentation, named entity recognition, and the information of POS tagging and category, this thesis puts forward the method of building question’s semantic graph to describe the user’s intention. To solve the problem that an entity may have a variety of Chinese expression, this thesis puts forward the methods of entity disambiguation and property word disambiguation.Fourth, building query mechanism. Since SPARQL search language is needed to search RDF knowledge base, the thesis builds a query mechanism to generate SPARQL queries automatically.In order to verify the efficiency of this subject, the thesis used mobile phone assistant voice questions as experimental dataset, and used datasets of baike and douban to build RDF knowledge bases. Experimental results show that the RDF knowledge base is efficient and easy for knowledge reasoning, the understanding of natural language questions method works well at analyzing and understanding user’s intention, and the natural language question answering system can return answers accurately.
Keywords/Search Tags:RDF knowledge base, Named entity recognition, Semantic graph, Disambiguation, SPARQL query
PDF Full Text Request
Related items