Font Size: a A A

Research On Key Technologies In Knowledge Base Construction

Posted on:2021-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:W L HuFull Text:PDF
GTID:2568306194476014Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The knowledge base can be used to describe entities,concepts,attributes and their relations in the real world,and it is usually constructed in the form of a relational graph network.With the development of the Internet,knowledge bases have played an important role in areas such as intelligent search,question answering,and personalized recommendation,and have also become a hot research topic for business companies and research institutions.Early knowledge bases were constructed entirely by domain experts,and this approach cannot meet the emerging needs under the background of big data.Due to the diversity of tasks,the complexity of natural language,and the difference in target data,the performance of the knowledge base construction methods based on information extraction technology is limited by the acquisition of high-quality labeled data,especially in areas where labels are scarce or data is noisy.Therefore,how to quickly build a knowledge base from unstructured text with scarce resources has become an urgent research topic.In order to tackle abovementioned problems,this paper focuses on three key technologies in knowledge base construction,namely named entity recognition,entity relationp extraction,and knowledge base representation learning.We aim to explore how to quickly build a large-scale knowledge base from resource-constrained unstructured text.Specifically,our research work in this paper can be mainly divided into the following three parts:For the named entity recognition subtask,this paper proposes a method of weakly supervised entity recognition based on active learning and self-training.The effective combination of multi-criteria active learning sampling strategies and self-training methods reduces the workload of manually annotation.At the same time,the powerful representation capabilities of pre-trained language models are used to allow the model to perform cold-start training under completely unsupervised data,further reducing the dependence on labeled data.For the entity relation extraction subtask,this paper proposes a method combining dilated convolution and soft entity type constraints.Using dilated convolutional network as text encoder can captures long-distance dependencies while maintaining efficient computational efficiency.At the same time,multi-task learning is utilized to introduce entity type constraints into the attention mechanism,and learn more precise attention weights by explicitly considering noise in external knowledge.For the knowledge base representation learning subtask,this paper proposes a method based on graph convolution and hierarchical relation embedding.A knowledge base representation learning architecture based on graph convolutional networks is designed to make full use of the dependencies between triples to encode entities and relations simultaneously.In addition,a combination of global relation embedding and local relation embedding is also introduced,and the edge structure information learned by local relation embedding is used to further guide the representation learning of entities and relations.Experiments on multiple public data sets demostrate that our research work on the three key technologies in knowledge base construction can effectively reduce the dependence on manually labeled data,improve the robustness to noisy data,and learn more accurate representation of the data,leading to improvement for the construction and application of knowledge base based on unstructured text.
Keywords/Search Tags:Knowledge base, Named entity recognition, Relation extraction, Knowledge Graph representation learning, Deep learning
PDF Full Text Request
Related items