Font Size: a A A

Entity Alignment Algorithm Of Encyclopedia Knowledge Base

Posted on:2020-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:M J HeFull Text:PDF
GTID:2428330596485354Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In recent years,with the human beings contacted and generated in their lives,the amount of data is growing rapidly.And due to the advent of the era of self-Media and the diversity of users,the data in the network presents diversification.As a platform for knowledge sharing and popularization,online network encyclopedia covers various types of knowledge data,having the characteristics of large-scale data and diverse ways of expression.The basic unit of network encyclopedia is encyclopedic entity.The same entity may have different appellation methods and the names of different entities may be the same.This phenomenon has caused great difficulties to the integration and reuse of knowledge data in encyclopedic knowledge base.Moreover,many large-scale encyclopedia websites in China are formed by netizens' collaborative editing,lack of normative expression,and some knowledge data are repetitive or even wrong.Without entity alignment,and knowledge fusion directly,there will be overlapping or even self-contradictory entities in the knowledge base,and the quality of knowledge data will be seriously reduced.In order to accomplish the task of entity alignment in the online encyclopedia knowledge base,expanding the knowledge base and carrying out knowledge fusion,the entity alignment algorithm of the encyclopedia knowledge base is studied.The main contributions of this paper include the following aspects:1.According to the characteristics of entity difference between Baidu Encyclopedia and Chinese Wikipedia,an Entity alignment for Encyclopedia Knowledge Base based on Topic Model is proposed.This algorithm applies Topic Model to entity alignment task,uses Latent Dirichlet Allocation(LDA)model to mine the unstructured descriptive text of encyclopedic entity,and combines the deep semantics of text in generating feature vectors of entity,and applies it to entity alignment task.2.As understanding Chinese words need combine the characteristics of context,an improved BP algorithm is proposed.When estimating the hidden parameters of LDA model,in consideration of words' context,the algorithm gives different meanings to the same words in the same text,making them more suitable for the real context.3.Aiming at the high similarity of unstructured description texts in Baidu Encyclopedia and HDWiki,an Entity alignment for Encyclopedia Knowledge Base based on Triplet Long Short-Term Memory Model is proposed.The algorithm uses LDA model to generate wordvectors,and uses Long Short-Term Memory(LSTM)to capture the semantic features of the full text,and automatically generates more accurate feature vectors to improve the alignment effect.Obtain the entity data of Baidu Encyclopedia,Chinese Wikipedia and HDWiki,and carry out a number of comparative experiments.The results are good,which show the effectiveness of the algorithm.
Keywords/Search Tags:Entity alignment, Knowledge base, LDA model, BP algorithm, Long Short-Term Memory, Triplet neural network
PDF Full Text Request
Related items