Research On The Problem Of The Construction Of Knowledge Base Chinese Question-answering-system Automatically

Posted on:2016-04-01

Degree:Master

Type:Thesis

Country:China

Candidate:Z X Li

Full Text:PDF

GTID:2308330461994291

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of Internet,the quantity of data on the Web has become bigger and bigger. Question-answering-systems play an important role in people lives. The current question-answering-system knowledge base is mainly constructed by people, costing much more manpower and material resources. Hindering the application of question-answering-system expands from single field to all. Therefore, based on previous research results of some researchers, this paper focused on the research of the organization and construction of entities in the lexicon. Proposed Word-adjacent co-occurrence algorithm extended field of key word library, and combined custom semantic dictionary and keyword extraction technology. Proposed SWR algorithm based on the relationship of words. Extracting subject word and character words from paragraph, construct a knowledge base with subject words and characteristic words annotated automatically.The main content of this paper:(1) Select the Information display sites as the data source of keyword extraction. Using Mutual information, filtering by the word-building rules, get the candidate keyword library. Using word-adjacent co-occurrence algorithm extend the candidate keyword library. Then construct the domain dictionary based on the How Net.(2) Research on current paragraph keyword extraction algorithm. Propose SWR algorithm base on the relationship of word. Extracting paragraph describes the subject words and characteristic words to construct the domain knowledge base.(3) Make the framework of implementation of Chinese question answering system knowledge base; validate the correctness of the theory method.The innovations of this paper are:(1) Propose a word-adjacent co-occurrence algorithm to extracting the domain keyword, Using Mutual information, filtering by the word-building rules, get the candidate keyword library. According to the characteristics of the Web, selecting the word in the candidate as the guiding word, extend the keyword base with mining the adjacent word. The results improve the accuracy and recall rate effectively.(2) Propose SWR algorithm based on the wordsâ€™ relationship. This paper propose SWR algorithm. The final score is divided into two parts: its own weight and voting weights.Select the "Semantic Relevancy computation based on How Net" as a word of weight matrix, adding semantic relationship to keyword extraction. Add the frequency of words to its own weight. The algorithm can improve the accuracy of subject words and feature extraction effectively; the construction of knowledge base is more scientific and reasonable.Base on the result above, this paper designs and completes the system framework of Shandong University of Finance and economics.Construct two bases: domain entity semantic dictionary and question answering system knowledge base. The experimental results show that, the effect of subject words and characteristic words extraction algorithm is well. We can use the algorithm to build a knowledge base. Constructing a knowledge base with subject words and characteristic words has broad application prospects in the websites of information.

Keywords/Search Tags:

Word-adjacent co-occurrence algorithm, Knowledge base, Domain dictionary, Subject words, Characteristic words

PDF Full Text Request

Related items

1	The Research And Implementation Of The System For Chinese Word Segmentation Base On Dictionary And Statistic
2	A New Words Extraction Method Based On Domain Specificity And Statistical Language Knowledge
3	Chinese Word Segmentation Method Based On Dictionary And Statistics Of The Words
4	Natural Language Processing, Words Related To Knowledge No Guide For Build And Balanced Classifier
5	Based On Dictionary And Word Frequency Analysis Of The Unknown Words From The Bbs Of Corpus Recognition Research
6	Improvement And Implementation Of Chinese Word Segmentation Algorithm Based On Dictionary
7	Research On Compound Word Extraction Based On Location Tag
8	Based Dictionary Intelligent Segmentation System
9	Research Of Word Semantic Similarity Based On Domain Knowledge
10	Research About Micro-blog Hot Topics Discovery Based On Optimized TF-TDF And Word Co-occurrence Model