Font Size: a A A

Research On The Approach Of Entity Type Inference In Wikipedia

Posted on:2018-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:B LuoFull Text:PDF
GTID:2348330542453047Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Knowledge base construction from Wikipedia has attracted much attention in the last few years.So far,various semantic information has been mined from Wikipedia,and one of the most valuable semantic information is type information.Currently,there are mainly two kinds of data sources for automatically inferring entity types in the whole Wikipedia:one is the semi-structured data such as infoboxes or categories in Wikipedia;the other is the abstracts of Wikipedia articles.However,the above data sources miss large scale main text of the articles in Wikipedia.Such main text does contain plenty of type information.In order to enrich the type inormation in knowledge bases,this thesis proposes an approach to infer entity types using all textual data from Wikipedia(i.e.article abstract + main text).At present,there has been some related work for the task of text-based type inference in natural language processing,but there are following problems in their work:1)a lot of noise exists in the automatically acquired training data,2)they rely heavily on manual defined features or rules,3)they can not explain why the entity has the inferred types.Therefore,it is worth to study how to overcome the above problems and infer entity types in all textual data from Wikipedia.This thesis proposes an approach to get training data from Wikipedia,and a graph model is constructed based on these training data,patterns for type inference are extracted from the graph model.The pattern extraction approach do not rely on manual defined features or rules,meanwhile patterns for type inference can explain why the entity has the inferred types.The major work and contributions in this thesis conclude the following three aspects:1)This thesis proposes a method for automatically acquiring training data with heuristic rules and random walk.The existing methods of extracting training data from Wikipedia may contain noisy trining data,and these noisy trining data will have a negative impact on the result.2)This thesis proposes an fine-grained entity typing approach based on the high quality training data from Wikipedia.A graph model based on word embedding is constructed,and a Bootstrapping algorithm is used to extract the patterns for type inference.Then,the patterns are used to match entity context,and if one pattern can match the context of some entity,the corresponding type of the pattern will be inferred as the type of the entity.Also,the pattern can be treated as the interpretation for the results of entity type inference.The experimental results showed that the proposed approach ourperforms the existing methods for fine-grained entity type inference on different datasets.3)The approach proposed in this thesis can be used to infer entity types in Wikipedia.After performing entity type inference using the proposed approach on the whole textual data in Wikipedia,we get 10,417,582 new type information compared with YAGO,and averagely enriched 2.49 types for each entity.
Keywords/Search Tags:Wikipedia, Entity Type Inference, Natural Language Processing
PDF Full Text Request
Related items