Font Size: a A A

Research On Knowledge Graph Construction Technologies Based On Text Feature Learning

Posted on:2019-06-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:P ZengFull Text:PDF
GTID:1368330611992999Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Knowledge graph is a special semantic network,which represents the various entities and the semantic relationship among entities existing in the real world in a structured form.The basic constituent units are the triples which includes the knowledge elements of the head entity,the tail entity,and the corresponding semantic relationship.Knowledge graph technology has experienced several stages of development.It is widely used in the intelligent search,automatic question and answer,intelligence analysis,machine translation,and cloud robots.The knowledge graph construction technologies for the named entity recognition task,relationship extraction task and knowledge graph completion task are the basic and important technologies,and many research results have been produced.The main objective of the named entity recognition task is to identify the knowledge entities in the unstructured text.The main goal of the relationship extraction task is to determine whether the sentence with two entities contains semantic relations and what semantic relationships are contained.The knowledge graph completion task can be divided into entity prediction task and relationship prediction task,and the main goal is to discover or correct the missing or wrong triples of knowledge elements such as entities and relationships in the knowledge graph through knowledge representation learning to improve the quality of knowledge graph.The rapid development of Internet technology has brought profound influence to the field of knowledge graph.In the Internet data,unstructured text data with a wide range of sources,large volume and fast update speed plays an important role.In order to extract knowledge elements from the Internet text and improve the knowledge graph,academia and industry have studied a series of knowledge graph construction technologies such as entity recognition,relationship extraction,and knowledge representation learning.Among them,the early knowledge graph construction technologies rely heavily on complex and intricate artificial feature engineering,and is usually closely related to the specific application field,and has the disadvantages of poor versatility and low quality of the constructed knowledge graph.The knowledge graph construction technologies based on deep learning technologies can automatically learn or extract data features,which greatly alleviates various drawbacks in the early methods.However,most of the currently named entity recognition and relation extraction methods or models based on deep learning technologies have problems such as weak feature extraction ability for unstructured texts,and existing knowledge representation learning methods or models also have insufficient consideration of the semantic relationship between entities.This paper has carried out a lot of exploration and research on these problems and proposes a series of new knowledge graph construction techniques.In the field of named entity recognition,this paper proposes a series of artificial neural networks named entity recognition models based on character feature learning,in order to cope with the weak ability of the existing named entity recognition models in modeling word semantic features.Based on deep learning methods such as convolutional neural networks and long short-term memory networks,this model learns the rich semantic features contained in unstructured text sequences from two different levels of character and word,and uses the conditional random field to model the semantic association between entity tags.Character-level feature learning is the research focus of the named entity recognition model proposed in this paper.The main method is to construct complex character feature learning modules by concatenation or stacking way based on simple convolutional neural network or recurrent neural network character feature learning module.These modules process the characters in the text sequence to learn the various character characteristics of the word.The experimental results based on English dataset CoNLL-2003 show that the named entity recognition model proposed in this paper is better than the previous deep learning technologies based named entity recognition model in metrics of precision,accuracy,recall and F score.This indicates that the proposed model has stronger character feature learning ability and named entity recognition ability than the previous models.In the field of relation extraction,this paper proposes a hybrid artificial neural network relation extraction model for the limited ability of existing relation extraction model to learning sentence instance features and the failure to effectively consider the existence of multiple semantic relationships in sentence instances.This model uses a convolutional neural network to extract word features from the character level,uses a bi-directional long short-term memory network to extract sentence instance features from the word level,and uses a two-level attention mechanism to model word and sentence instance weights to reduce the negative effects on non-important words and incorrect label.These mechanisms take into account the various semantic features in the unstructured text instance,which plays an important role in improving the instance learning ability of the model.In addition,the model also uses the loss function based on the listwise ranking mechanism to model the multi-semantic relationship problem,and more effectively learns the various types of effective information contained in the semantic relationship label.It is the most comprehensive model which uses the text feature information and semantic relationship tag information in all relation extraction models based on deep learning technology.Experiments on the ECML dataset show that the relationship extraction model proposed in this paper does not rely on external text descriptive information and other language data support,the entity relationship extraction ability exceeds the previous similar models,and has the best performance in the evaluation metrics in the PR curve graphs.In the task of complementing knowledge graph,this thesis proposes a knowledge representation learning model with multiple artificial neural network entity relationship learning modules and based on the strongest translation model for the problem that the previous knowledge representation learning model cannot consider the multi-level indirect relationship between entities.This model is based on the existing triple data in the knowledge graph,and constructs entities from these path data by constructing a long-distance entity relationship path and using an indirect relationship learning module with a long short-term memory network as the core to learn the vector representation of the entities and relationships.In addition,in order to model the direct semantic relationship with great influence between entities,the model also uses a direct relational learning module based on a simple three-layer artificial neural network that performs best in the existing translation model.The module takes the listwise ranking loss as the optimization goal and has strong knowledge representation learning ability.The knowledge graph completion experiment based on the dataset FB15 K shows that the model proposed in this paper has better mean ranking and hit rate than the similar model in the entity prediction task.In the relationship prediction task,the model has the best performance in the mean ranking metric,and also has the performance similar to the previous best model in the hit rate metric.
Keywords/Search Tags:Knowledge Graph, Knowledge Graph Construction, Named Entity Recognition, Relation Extraction, Knowledge Representation Learning, Knowledge Graph Completion, Entity Prediction, Relation Prediction
PDF Full Text Request
Related items