Research On The Approach Of Entity Type Inference In Wikipedia

Posted on:2018-12-05

Degree:Master

Type:Thesis

Country:China

Candidate:B Luo

Full Text:PDF

GTID:2348330542453047

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Knowledge base construction from Wikipedia has attracted much attention in the last few years.So far,various semantic information has been mined from Wikipedia,and one of the most valuable semantic information is type information.Currently,there are mainly two kinds of data sources for automatically inferring entity types in the whole Wikipedia:one is the semi-structured data such as infoboxes or categories in Wikipedia;the other is the abstracts of Wikipedia articles.However,the above data sources miss large scale main text of the articles in Wikipedia.Such main text does contain plenty of type information.In order to enrich the type inormation in knowledge bases,this thesis proposes an approach to infer entity types using all textual data from Wikipedia(i.e.article abstract + main text).At present,there has been some related work for the task of text-based type inference in natural language processing,but there are following problems in their work:1)a lot of noise exists in the automatically acquired training data,2)they rely heavily on manual defined features or rules,3)they can not explain why the entity has the inferred types.Therefore,it is worth to study how to overcome the above problems and infer entity types in all textual data from Wikipedia.This thesis proposes an approach to get training data from Wikipedia,and a graph model is constructed based on these training data,patterns for type inference are extracted from the graph model.The pattern extraction approach do not rely on manual defined features or rules,meanwhile patterns for type inference can explain why the entity has the inferred types.The major work and contributions in this thesis conclude the following three aspects:1)This thesis proposes a method for automatically acquiring training data with heuristic rules and random walk.The existing methods of extracting training data from Wikipedia may contain noisy trining data,and these noisy trining data will have a negative impact on the result.2)This thesis proposes an fine-grained entity typing approach based on the high quality training data from Wikipedia.A graph model based on word embedding is constructed,and a Bootstrapping algorithm is used to extract the patterns for type inference.Then,the patterns are used to match entity context,and if one pattern can match the context of some entity,the corresponding type of the pattern will be inferred as the type of the entity.Also,the pattern can be treated as the interpretation for the results of entity type inference.The experimental results showed that the proposed approach ourperforms the existing methods for fine-grained entity type inference on different datasets.3)The approach proposed in this thesis can be used to infer entity types in Wikipedia.After performing entity type inference using the proposed approach on the whole textual data in Wikipedia,we get 10,417,582 new type information compared with YAGO,and averagely enriched 2.49 types for each entity.

Keywords/Search Tags:

Wikipedia, Entity Type Inference, Natural Language Processing

PDF Full Text Request

Related items

1	Research And Implementation Of Key Technologies In Mathematical Natural Language Processing
2	Study On Related Entity Finding In Web
3	Research On Enhancing Natural Language Inference Through Knowledge Graph Embedding And Cross-lingual Transfer
4	Inference with classifiers: A study of structured output problems in natural language processing
5	Research And Implementation On Computing Semantic Relatedness Using Chinese Wikipedia
6	Research On The Construction Technology Of News Text Vocabulary Chain Based On Wikipedia Corpus
7	Natural Language Inference Based On Seq-Tree Encoder With Syntax Information
8	Mining Semantic Knowledge From Chinese Wikipedia
9	Design And Implementation Of Intelligent QA Reasoning System Based On Natural Language Entity Relationship
10	Research On Entity Relation Extraction Technology Based On Deep Learning