Font Size: a A A

Research On Knowledge Graph Techniques Based On Representation Learning

Posted on:2019-04-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:J Z ZhuFull Text:PDF
GTID:1488306344459014Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,the application of artificial intelligence technology(AIT)represented by deep learning has made great progress in the fields of semantic search,automatic driving,intelligent question-answer,machine translation,etc.,which has significantly promoted the improvement of the intellective level of the machine.In fact,the implement of machine intelligence relies on knowledge graph techniques.As one of the important ways of knowledge organization under the era of big data,knowledge graph(KG)describes the concepts,entities and their complex relationships in the objective world in a structured form,which also provides an efficient way to organize,manage,understand and utilize the massive,heterogeneous and dynamic big data on the Internet.Since the knowledge graph has been proposed by Google in 2012,the Internet search engine company around the world,such as,Microsoft Bing in the United States,Baidu and Sogou in China,etc.,have also released their knowledge graph products.Knowledge graph has rapidly developed into a research hotspot in the field of artificial intelligence,and has attracted high attention from the academic and industry.With the development of deep learning and brain science,knowledge graph will become the brain of intelligent machines in the future.In this dissertation,three core techniques in knowledge graph,including knowledge representation,knowledge extraction and knowledge fusion are studied.And this dissertation lists the main research approaches as follows.Firstly,regarding the problem of how to represent and deal with the knowledge in the objective world in a computer,a novel translation-based method by modeling the correlations of relations for knowledge graph embedding is proposed.This dissertation points out the existing problems of the existing knowledge representation methods that all of them map different relations into the vector space separately and the intrinsic correlations of these relations are ignored.Considering the different relations may connect to a common entity,for examples,the triples(Steve Jobs,PlaceOfBrith,California)and(Apple Inc.,Location,California)share the same entity California as their tail entity,we realize that some related factors may exist between PlaceOfBrith and Location.By analyzing the embedded relation matrices learned by the typical knowledge representation methods,the existence of the correlations of relations is verified,and they are showed as low-rank structure over the embedded relation matrix.Based on the correlations of relations,a new kind of method for knowledge graph embedding is proposed,which adopts the matrix decomposition method to decomposing the embedded relation matrix as a product of two low-dimensional matrices,for characterizing the low-rank structure.In this way,the problem of learning the embedded relation matrix can be converted into the two low-dimensional matrices,and the correlations of relations can be captured effectively during training.Experimental results on the public datasets demonstrate that the proposed method is very effective on the standard evaluation tasks for the knowledge graph embedding method.Secondly,a new kind of convolution neural network with word-level attention mechanism is proposed,which can be used for composing the representation of a sentence in relation extraction task.This dissertation supposes that different words in a sentence are differentially informative,and the importance of words is highly relation-dependent,i.e.,the same word may be differentially important for different relations.Based on these considerations above,the new convolution neural network with word-level attention mechanism is proposed,which could dynamically adjust the word weight according to the concentrated semantic relation.As a result,the sentence representation can be composed more precise and the result of relation extraction can be effectively improved.Thirdly,to meet the requirement of entity extraction from the massive textual data in the era of big data,a kind of distributed conditional random fields(CRFs)based on Spark is designed and implemented in this dissertation.CRFs has been widely applied to Chinese words segmentation,part-of-speech tagging,named entity recognition and other natural language processing tasks.The traditional CRFs tools in single-node computer meet many challenges when facing the large-scale text tasks,such as the performance bottleneck due to limited processing capability of the machine.To tackle these problems,SparkCRF which is a kind of distributed CRFs running on cluster environment based on Spark is designed and implemented,following the idea of 'divide and conquer'.Numerous experiments on the public data sets have verified that the SparkCRF can be used for entity extraction from large-scale text data.Finally,this dissertation proposes a representation learning based method for knowledge graph entity alignment.To realize the multi-source knowledge fusion,a large-scale uniform knowledge graph can be built from the top level,which will further enhance the ability of a machine to understand the underlying data.As a key technology of knowledge fusion,entity alignment has a significant research value.In this dissertation,we propose a novel supervised method for knowledge graph entity alignment based on representation learning.First,the proposed method automatically learns the semantic representations(also known as embeddings)for the entities and relations of a knowledge graph in a low-dimensional vector space,and these embeddings contain the intrinsic structural information of a knowledge graph and the attributive features of entities.After that,taking the manually aligned entity pairs as prior knowledge,the cross-KG mapping relationship between entities could be learned,which will be used for predicting entity alignment.Experiments conducted on real data sets demonstrated that this method can effectively improve the precision of knowledge graph entity alignment.To sum up,this dissertation proposes a set of methods for knowledge graph techniques including methods focus on knowledge representation,knowledge extraction and knowledge fusion.And I hope this dissertation can promote the further development and application of knowledge graph techniques.
Keywords/Search Tags:representation learning, knowledge graph, knowledge graph techniques, knowledge representation, knowledge extraction, knowledge fusion
PDF Full Text Request
Related items