Font Size: a A A

Research On Information Extraction Method For Knowledge Graph Construction In Industrial Field

Posted on:2021-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:B Y DengFull Text:PDF
GTID:2518306470463204Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As an important technology for industrial informatization,the information extraction technology used in the construction of knowledge graphs can systematically acquire and structure the industrial domain knowledge from massive raw data.However,in actual construction,the information extraction technology will encounter many difficulties.On the one hand,it is difficult to obtain data in the industrial field,as well as professional data due to confidentiality,which making it difficult to determine the source of the data for information extraction.On the other hand,when extracting information from data,the quality of the triples obtained directly determines the quality of the constructed knowledge graph.Therefore,how to extract the entities and relations in the triplet synchronously through the fusion of tasks is a research focus.In order to solve these two problems,this paper proposes an information extraction framework based on the research of existing algorithms,and proposes targeted improvement schemes for the involved algorithms.The main research contents are as follows:1)Regarding the difficulty of obtaining data,this paper obtains relevant information from open source databases(such as Wikipedia,etc.),and processes the massive data in these databases through text mining algorithms such as LDA topic model to obtain relevant text in the industrial field.This paper starts from two aspects of high-frequency words and low-frequency words,and improves the text classification effect of the LDA topic model by changing its topic-word distribution.2)In current information extraction related algorithms,the main tasks are named entity recognition,relation extraction and joint extraction of entities and relations.In these tasks,the joint extraction model mainly relies on the respective model research progress of the two sub-tasks.In this paper,we study the most popular models in two sub-tasks and make some improvements.3)Joint extraction of entities and relations is a fusion task,in which the key point is how to merge the correlation between two sub-tasks to obtain better results.Based on the integration of the best model in named entity recognition and relation extraction,this paper proposes a joint extraction model with highly shared parameters.Experiments show that the joint extraction model proposed in this paper can well achieve the goal of extracting triples in text.The effectiveness of the parameter sharing strategy in this paper is verified by comparison with other models.On the other hand,experiments on the joint extraction model proposed in this paper show that the method designed in this paper to obtain data from the open domains and extract domain-related triplets has certain effectiveness,and can play a good role in promoting the construction of knowledge graphs for industrial informatization.
Keywords/Search Tags:Knowledge Graph, Deep Learning, Topic Model, Named Entity Recognition, Joint Extraction of Entities and Relations
PDF Full Text Request
Related items