The Construction Of Knowledge Base Based On Chinese Encyclopedia

Posted on:2016-01-30

Degree:Master

Type:Thesis

Country:China

Candidate:L F Wang

Full Text:PDF

GTID:2308330470467665

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years,developments of mobile Internet are in full swing, Internet of Things, cloud computing and other technologies, network applications emerging one after another, network data witnessed explosive growth. Facing such a large amount of data, how to derive valuable knowledge and make full use of it with deep calculation and analysis has become a hot research topic.Currently, different countries have built as many as 50 kinds of knowledge base, most of which are based on English Wikipedia or other English resources. Actually,Chinese encyclopedias (Baidu Encyclopedia, Hudong Encyclopedia and Chinese Wikipedia) have large amount of entries with high quality.This thesis contributes to build a knowledge base based on Chinese encyclopedias and has made some work as follows.(1)This thesis designs and implements a multi-threads web spider to download encyclopedia pages. We use breadth-first approach to the download the URLs of pages and categories and then download the pages.After analyzing the structured features of web pages,we use heuristics and other methods to extract semantic information from them.(2)The method of using the classification system of Hudong Encyclopedia to construct concept hierarchy system is presented in this thesis.This method extracts linguistic features and semantic features of categories to train a Adaboost model to extract hyponymy relations between categories.We use the relations to construct concept hierarchy system automatically. The same method is used to extract the relationship between category and entry.(3)This thesis uses Conditional Random Fields to extract attribute values from the unstructured text of Encyclopedia.Firstly,we identify attribute-value pairs from Hudong Encyclopedia pages that are featured with Infoboxes, which in turn can be used to learn which attributes we should pay attention to for different Hudong Encyclopedia entries.We then use a keyword matching approach to identify candidate sentences for each attribute in a plain Hudong Encyclopedia article. At last, we train a CRF model to extract corresponding values from these candidate sentences.In this thesis, we construct concept hierarchy system from the category system of Hudong Encyclopedia, and we perform experiments on Hudong Encyclopedia articles focusing on category "People",achieving excellent performance.

Keywords/Search Tags:

Hyponymy, CRFs, property values, knowledge base, Chinese online encyclopedia

PDF Full Text Request

Related items

1	Research On Chinese Hyponymy Relation Extraction And Application
2	Research On Ontology Learning And Knowledge Acquisition From Chinese Online Encyclopedia
3	Chinese Internet Encyclopedia Of Knowledge Dissemination Study
4	Discovering Entity Relationship And Semantic Annotations Base On Wikipedia Encyclopedia Knowledge Resources
5	Research On The Method Of Knowledge Extraction And Knowledge Base Construction From Hudong Encyclopedia
6	Building A Connected Semantic Knowledge Base Using Heterogeneous Chinese Encyclopedias
7	Taxonomy Induction Research On Knowledge Base From Chinese Encyclopedia
8	Researeh On Attribute Relation Extraction From Chinese Online Encyclopedia
9	A Research On Slot Filling Technique In Online Encyclopedia-based Knowledge Graph Construction
10	Research On Named Entity Recognition And Relation Extraction Facing To Domain-oriented Knowledge Base Construction