Font Size: a A A

Multi-source Knowledge Base Fusion And Application For Heterogeneous Data Sources

Posted on:2021-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:S Y LiuFull Text:PDF
GTID:2518306560453144Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
The knowledge base contains rich information resources,which is the basis of many natural language processing and complete knowledge base construction.Existing knowledge bases generally have problems such as partial domainization,small amount of data,and lack of information.At the same time,it is difficult to manually build a general knowledge base,Many errors,slow update,time-consuming and labor-intensive,and the data fusion based on the existing knowledge base has the advantages of fast speed,fewer errors,easy updating,comprehensive coverage,etc.Therefore,the multi-source heterogeneous knowledge base fusion algorithm has High research value.Based on open encyclopedia data sources,this paper conducts research on multi-source heterogeneous knowledge base fusion algorithms.Aiming at the problems of data duplication,entity ambiguity,and missing information between heterogeneous knowledge bases,a“multi-information weighted fusion entity alignment algorithm” is proposed."And"Word2vec Word Vector Representation Algorithm Based on TF-IDF Weighting "respectively complete the" entity alignment "and" attribute fusion "tasks in the fusion process of multisource heterogeneous knowledge bases.Corresponding experiments and comparative experiments are set up for the proposed fusion algorithm.The experimental results show that the proposed fusion algorithm has improved accuracy and recall compared with the existing fusion algorithms,which validates the proposed fusion algorithm.The effectiveness and practicality of the fusion algorithm.The main contributions of this article are as follows:(1)A multi-information weighted fusion entity alignment algorithm is proposed for the entity alignment task.This algorithm uses dynamic programming ideas to solve the minimum editing distance and train the text feature vector based on the Doc2 vec model to similarly structure and attribute the entities.Finally,the entity similarity is obtained by weighted average,and the entity alignment task is completed.(2)For the task of attribute fusion,a “Word2vec word vector representation algorithm based on TF-IDF weighting” is proposed.This algorithm first uses the TF-IDF algorithm that introduces influence factors to perform weight representation of each word in the attribute text,and then uses the Word2 vec model to characterize The word vector is trained,and the obtained word vector is used to obtain the feature vector of each clause in the attribute by weighted average.Finally,the feature vector of the clause is similarly solved to complete the attribute fusion task between entity pairs.(3)Based on the two algorithms proposed in this paper,a "Graph Search" system was developed.This system contains eight functional modules,which comprehensively covers the functions of multi-source heterogeneous knowledge base fusion and information query,and achieves the goals of automatic data fusion and information related query The practical application of the fusion algorithm proposed in this paper has verified the effectiveness and practicability of the algorithm in the fusion process of multi-source heterogeneous knowledge base.
Keywords/Search Tags:Knowledge base fusion, Entity alignment, Attribute completion, Feature extraction, Associated storage
PDF Full Text Request
Related items