Font Size: a A A

Knowledge Fusion Based On Machine Learning Model And Crowdsourcing

Posted on:2018-10-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:C H LiFull Text:PDF
GTID:1318330542459090Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Knowledge plays an important role in machine understanding and semantic Web.In recent years,automatic knowledge acquisition from unstructured and semi-structured Web data has been extensively studied.However,there are ubiquitous semantic heterogeneous and uncertainty in knowledge from these data source.There is an urgent need for effective knowledge fusion methods to integrate multi-source semantic heterogeneous knowledge into a central knowledge base.Recently,crowdsourcing has attracted a wide range of research interests,and has been widely adopted in many fields,such as database,image search,nature language processing and information retrieval.It has been proved to be a viable and cost effective alternative solution for human experts and become an effective way to deal with computer-hard tasks.Therefore,this dissertation proposes hybrid machine learning and crowdsourcing solutions to knowledge fusion for the problems of semantic matching,knowledge refining and inference.Main work and contributions are as follows:(1)For the problem of ontology heterogeneity,an ontology matching model based on Markov logic networks has been proposed to compute ontology alignment.Compared with existing ontology matching methods,Markov logic networks based model has many advantages.It combines the first-order predicate logic and the probabilistic graph model perfectly,which enables combining various matching strategies,and provide an excellent framework for ontology matching.An improved matching propagation strategy is proposed to model relationships between matches,which can help to identify correct correspondences and find correspondences that don't appear in candidates.Besides,it adopts an interactive method for threshold selection,collecting user feedbacks to determine optimal threshold.This model can improve matching performance and robustness effectively.(2)To handle the problem of semantic heterogeneity of Web tables,this paper proposes a unified matching and cleaning approach leveraging the power of knowledge base and crowd.It computes semantics probabilities for tables based on knowledge bases and refines semantics labels using crowdsourcing,which enables overcome the problem of incompletion of web tables.So far,table matching and cleaning have been traditionally studied in isolation.The proposed approach annotates table columns and column pairs with types and relationships of the knowledge base,and annotates data as correct or incorrect according to if it matches the table semantics.It generates table matches and data repairs based on annotations.It selects optimal crowdsourcing tasks intelligently under a given budget and infers concepts that best model the columns based on crowd answers.Compared with traditional schema matching methods,it can better handle Web data and improves performance for both tasks of matching and cleaning.(3)Automated constructed knowledge bases are often very noisy.Automatic algorithms for knowledge refining can improve the quality of knowledge bases,but are far from perfect.In this paper,we leverage crowdsourcing to improve the quality of automatically extracted knowledge bases.As human labeling is costly,an important research challenge is how we can use limited human resources to maximize the quality improvement for a knowledge base.To address this problem,we first introduce a concept of semantic constraints that can be used to detect potential errors and do inference among candidate facts.Then,based on semantic constraints,we propose rank-based and graph-based algorithms for crowdsourced knowledge refining,which judiciously select the most beneficial candidate facts to conduct crowdsourcing and prune unnecessary questions.(4)Knowledge bases are often greatly incomplete.A promising approach is to embed KBs into latent spaces and make inferences by learning and operating on latent representations.Existing embedding models are all supervised methods,learning entity and relation vectors from triplets.This paper proposes a novel knowledge representation learning method with sub-space projection.It firstly estimates unsupervised entity name vectors from large unlabeled text corpus,which encode syntax and semantic properties of entity names.Then,it adapts unsupervised entity name vectors to a embedding sub-space using available triplets,learning knowledge representations and projection matrix simultaneously.This technique is particularly applicable to situations where only a small amount of labeled data can be available and zero-shot scenarios.
Keywords/Search Tags:Knowledge Fusion, Ontology Matching, Knowledge Refining, Markov Logic Networks, Crowdsourcing, Embedding Learning
PDF Full Text Request
Related items