Font Size: a A A

Automatic Construction And Representation Method In Text-Oriented Knowledge Graph

Posted on:2022-02-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y F DaiFull Text:PDF
GTID:1528307049956209Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
At present,artificial intelligence is gradually evolving from perceptual intelligence to cognitive intelligence.People are no longer satisfied with the results obtained by statistical machine learning algorithms,but are more concerned with the interpretability of the learning results and the knowledge contained in the data itself.Knowledge graph,a large-scale semantic network composed of nodes and edges,has become an important form of knowledge representation in the context of cognitive intelligence.With its interpretable,inferable and largescale features,knowledge graphs are widely used in many practical applications of artificial intelligence,such as intelligent search,intelligent recommendation and automatic question and answer.It can be said that knowledge graph is an important engine to drive machines to achieve cognitive intelligence,and is a popular research topic in AI at this stage.With the advent of the information age,the traditional knowledge map is also facing many challenges.For example,how to build a knowledge graph from unstructured text and get rid of manual processing;Or how to solve the problems such as sparse data and complex calculation existing in the large-scale knowledge graph.In this dissertation,we investigate and experimentally demonstrate several key theoretical and practical issues in the automatic construction and representation of text-oriented knowledge graphs.Among them,extracting corresponding entities and relationships from unstructured textual information is the basis for constructing large-scale knowledge graphs,and numerical representation of knowledge graphs can be realized through knowledge graph embedded learning,which enables machines to apply knowledge graphs for knowledge computation and reasoning more easily and efficiently,and finally,these theoretical methods are applied to the prediction of drug-drug interactions to verify the effectiveness and generalizability.The research work and main contributions of this dissertation can be summarized in the following four aspects.First of all,to solve the problem of relation extraction in the reconstruction of knowledge graph,a novel dual-channel neural network structure combined with attraction mechanism is proposed to solve this problem.First,we use the bidirectional sequence LSTM channel to capture the semantic information in the original sentence,and concurrently use the tree structure LSTM channel to obtain the syntactic knowledge of the sentence;then,for each sentence,in order to confirm the word contained in the word sequence The amount of information is the most,and then the weight of each word is calculated according to the mechanism,so as to obtain the proportion of information corresponding to each word;finally,the information of the two channels is summarized through the fully connected layer and the final result is returned.Experimental results on two real-world data sets show that our model can make better use of the information contained in sentences,and has a significant improvement and improvement over existing methods in relation classification.Second,for the static knowledge graph completion problem,inspired by the generative confrontation network,the generator can be used to sample more reasonable negative triples,thereby enhancing the research and judgment ability of the discriminator,and further optimizing the embedding vector.The adversarial learning framework is introduced to generate more credible negative samples.However,the disappearance of gradients on discrete data is an inherent problem in traditional generative adversarial networks.This dissertation uses Wasserstein distance instead of traditional divergence to solve this problem,and proposes a knowledge graph representation learning model based on adversarial networks.In addition,since the text describing the entity also provides rich semantic information,our model also uses this additional information to improve the performance of the embedded model.In the experiment,we evaluate the two tasks of link prediction and triple classification.Experimental results show that Wasserstein distance can solve the problem of discrete data gradient disappearance and accelerate model convergence.The additional description information can also significantly improve the performance of the model.Third,most of the existing knowledge graph representation methods ignore the large amount of time series-related information contained in the knowledge graph,and are unable to process and model these vital time series information and features.In order to solve this problem,in recent years,scholars have proposed a knowledge graph embedding technology that integrates time series information to integrate time and original structured information.However,these methods also only use uniform random sampling to construct negative facts.As a result,the negative samples obtained are usually too simple to train an effective model.This dissertation introduces adversarial learning to propose a new time knowledge graph embedding framework to further improve the performance of the traditional temporal knowledge graph embedding model.In our framework,a generator is used to construct high-quality negative sample triples,and the discriminator will learn to acquire entity and relationship embeddings based on positive and negative samples.Through comprehensive experiments on the time series knowledge graph,the results show that our proposed framework can significantly improve the performance of the baseline model,and can prove the effectiveness and applicability of our framework.Fourth,In order to further verify our proposed representation learning method,the knowledge graph representation framework mentioned above is used in Chapter 5 of this dissertation to formalize the mining of drug-drug interaction,and a new method for predicting drug-drug interaction is proposed.Compared to clinical trials or traditional machine learning based methods,the present method does not require a large number of manual functions to obtain better performance.The latent vectors of the autoencoder are able to generate more reasonable negative samples,and the discriminator uses these negative and positive triples to train the knowledge graph embedding model.Unlike the traditional reinforcement learning heuristics used in adversarial learning,this chapter applies Gumbel-Softmax relaxation to solve the gradient vanishing problem on discrete data and accelerate the convergence of the knowledge graph representation model for drug interactions.
Keywords/Search Tags:Knowledge graph representation learning, Relation classification, Link prediction, Knowledge graph construction, Wasserstein distance, Generative adversarial network
PDF Full Text Request
Related items