Font Size: a A A

Key Techniques Of Entity Alignment In Knowledge Bases

Posted on:2019-06-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhuangFull Text:PDF
GTID:1368330623461922Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the vigorous development of the World Wide Web,entity alignment on knowl?edge base is becoming a hot research topic in recent years.The goal of this alignment is to link multiple knowledge bases effectively and create a large-scale and unified knowledge base from the top-level to enrich the knowledge base,which can be used to help machines to understand the data and build more intelligent applications.However,there are still many research challenges on data quality and scalability,especially in the background of big data.To summarize,the main contributions of this dissertation are as follows:1.A Survey on Entity Alignment of Knowledge BaseIn this chapter,we present a survey on the techniques and algorithms of entity alignment on knowledge base in decade,and expect to provide alternative options for further research by classifying and summarizing existing methods.Firstly,the entity alignment problem is formally defined.Secondly,the overall architecture is summarized and the research progress is reviewed in detail from algorithms,feature matching and indexing aspects.The entity alignment algorithms are the key points to solve this problem,and can be divided into pair-wise methods and collective methods.The most commonly used collective entity alignment algorithms are discussed in detail from local and global aspects.Finally,open research issues are discussed and possible future research directions are prospected.2.A Scalable Partition-Blocking-Based Alignment Framework for Aligning Large-Scale Knowledge BasesExisting knowledge-base alignment algorithms have some limitations:(1)not scal-able,(2)poor quality,(3)not fully automatic.To address these limitations,we develop a scalable partition-and-blocking based alignment framework,named PBA,which can automatically align knowledge bases with tens of millions of instances efficiently.PBA contains three steps.(1)Partition:we propose a new hierarchical agglomerative co-clustering algorithm to partition the class hierarchy of the knowledge base into multiple class partitions.(2)Blocking:we judiciously divide the instances in the same class par-tition into small blocks to further improve the performance.(3)Alignment:we compute the similarity of the instances in each block using a vector space model and align the instances with large similarities.We also develop a parallel algorithm to further improve the efficiency.The automatic algorithms in this chapter can be the foundation for the later research work on human-machine algorithms3.A Hybrid Human-Machine Method for Entity Alignment in Large-Scale Knowledge BasesDue to the inconsistency and uncertainty,automatic techniques for the alignment of large-scale knowledge bases(KB s)usually achieve low quality(especially recall).Thanks to the open crowdsourcing platforms,we can harness the crowd to improve the alignment quality.To achieve this goal,in this paper we propose a novel hybrid human-machine framework for large-scale KB integration.We first partition the entities of different KBs into many smaller blocks based on their relations.We then construct a partial order on these partitions and develop an inference model which crowdsources a set of tasks to the crowd and infers the answers of other tasks based on the crowdsourced tasks.Next we formulate the question selection problem,which,given a monetary budget B,selects B crowdsourced tasks to maximize the number of inferred tasks.We prove that this problem is NP-hard and propose greedy algorithms to address this problem with an approximation ratio of 1-1/e.4.Crowdsourced Entity Alignment:A Decision Theory Based ApproachIn crowdsourced entity alignment tasks,there are usually large numbers of candidate pairs to be verified by the crowd workers,and each pair will be assigned to multiple workers to achieve high quality.Thus,two fundamental problems are raised:(1)question selection—what are the most beneficial questions that should be crowdsourced,and(2)question assignment—which workers should be assigned to answer a selected question?In this paper,we address these two problems by decision theory.Firstly,we define the problems on two budget constraints.The first takes the marginal gain into account,and the second focuses on the limited budget.Then,we formulate the decision-making problems under different budget constraints and build influence diagram to per-form result inference.We prove that this problem is NP-hard and propose two efficient greedy algorithms to address these two problems.
Keywords/Search Tags:Knowledge Base, Entity Alignment, Human-Machine Method, Crowd-sourced Task Assignment, Decision Theory
PDF Full Text Request
Related items