Font Size: a A A

Research On The Problem Of The Construction Of Knowledge Base Chinese Question-answering-system Automatically

Posted on:2016-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z X LiFull Text:PDF
GTID:2308330461994291Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet,the quantity of data on the Web has become bigger and bigger. Question-answering-systems play an important role in people lives. The current question-answering-system knowledge base is mainly constructed by people, costing much more manpower and material resources. Hindering the application of question-answering-system expands from single field to all. Therefore, based on previous research results of some researchers, this paper focused on the research of the organization and construction of entities in the lexicon. Proposed Word-adjacent co-occurrence algorithm extended field of key word library, and combined custom semantic dictionary and keyword extraction technology. Proposed SWR algorithm based on the relationship of words. Extracting subject word and character words from paragraph, construct a knowledge base with subject words and characteristic words annotated automatically.The main content of this paper:(1) Select the Information display sites as the data source of keyword extraction. Using Mutual information, filtering by the word-building rules, get the candidate keyword library. Using word-adjacent co-occurrence algorithm extend the candidate keyword library. Then construct the domain dictionary based on the How Net.(2) Research on current paragraph keyword extraction algorithm. Propose SWR algorithm base on the relationship of word. Extracting paragraph describes the subject words and characteristic words to construct the domain knowledge base.(3) Make the framework of implementation of Chinese question answering system knowledge base; validate the correctness of the theory method.The innovations of this paper are:(1) Propose a word-adjacent co-occurrence algorithm to extracting the domain keyword, Using Mutual information, filtering by the word-building rules, get the candidate keyword library. According to the characteristics of the Web, selecting the word in the candidate as the guiding word, extend the keyword base with mining the adjacent word. The results improve the accuracy and recall rate effectively.(2) Propose SWR algorithm based on the words’ relationship. This paper propose SWR algorithm. The final score is divided into two parts: its own weight and voting weights.Select the "Semantic Relevancy computation based on How Net" as a word of weight matrix, adding semantic relationship to keyword extraction. Add the frequency of words to its own weight. The algorithm can improve the accuracy of subject words and feature extraction effectively; the construction of knowledge base is more scientific and reasonable.Base on the result above, this paper designs and completes the system framework of Shandong University of Finance and economics.Construct two bases: domain entity semantic dictionary and question answering system knowledge base. The experimental results show that, the effect of subject words and characteristic words extraction algorithm is well. We can use the algorithm to build a knowledge base. Constructing a knowledge base with subject words and characteristic words has broad application prospects in the websites of information.
Keywords/Search Tags:Word-adjacent co-occurrence algorithm, Knowledge base, Domain dictionary, Subject words, Characteristic words
PDF Full Text Request
Related items