Unstructured Information Extraction Methods For Domain-Specific Knowledge Graphs

Posted on:2023-06-04

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhang

Full Text:PDF

GTID:2568307103485804

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Domain knowledge graphs can allow machines,like humans,to understand the deep meaning of texts by storing knowledge.This prevents machines from making wrong reasoning and major decision mishaps.However,a huge amount of structured data is necessary in the process of domain knowledge graph construction.Manual conversion of unstructured textual data into structured data is very expensive and requires high professionalism.For this reason,this paper proposes an automated domain knowledge graph-oriented unstructured information extraction method.Therefore,it becomes significant to study how to use information extraction techniques to obtain structured data.The existing information extractions are mainly based on sequence annotation methods and span representation-based methods.Sequence labeling-based approaches are divided into three directions: labeling,table filling and sequence-to-sequence.These studies prefer BIO(begin,inside,outside)/ BILOU(begin,inside,last,outside,unit)-based strategies,which easily lead to entity nesting problems.The span-based representation approaches can perform detailed search on all spans and identify entity nesting efficiently.Among the most advanced span-based models,Sp ERT contributes by using a sufficient number of strong negative samples and localized contexts,but the model still suffers from the lack of explicit boundary supervision on entities and inadequate utilization of domain-specific information.For this reason,we propose an information extraction(IE)method based on attention contribution degree and an IE method for the judicial domain,respectively.The core contributions of the work in this paper are as follows:1.To enhance the sensitivity of the model to entity boundaries and the mining of domain-specific information,we introduce attentional contribution degree as boundary confidence.Specifically,a span classifier based on the multilayer perceptron-softmax structure is connected to the attention head residuals of each layer,which makes the model without losing the original information of word elements as the depth increases.The experiments demonstrate the advancement of the method in the fields of journalism,science and medicine,especially outperforming the current state-of-the-art in the Sci ERC(scientific information extractor)dataset list.2.To solve the problem of difficulty in extracting special elements based on the database level,we select five of the more complex evaluation indicators(Provided by the Law School of Xiangtan University.It is used to evaluate the effectiveness of judicial reform)and propose novel and more professional matching rules under the guidance of legal professionals,which are combination of locating key segments,matching keywords and multiple logical reasoning.3.A deep learning model based on contextual information computation strategy is proposed for entities that cannot be identified by matching rules and depend on semantic context.This strategy first constructs the inter-word normalized attention scores into multi-channel "attention graphs",and then trains multi-scale convolutional pooling layers to compress the "attention graphs" into multi-channel "attention points".The "attention points" are used as contextual information for downstream tasks to enhance inter-word dependency.It is experimentally demonstrated that this method outperforms the mainstream Bert-BiLSTM-CRF model.

Keywords/Search Tags:

information extraction, attention score, Transformer pre-training model, domain knowledge graph

PDF Full Text Request

Related items

1	Construction Of COVID-19 Domain Knowledge Graph Based On Pre-training Language Model
2	Research And Implementation Of Domain Knowledge Graph Construction Method Based On Deep Learning
3	The Design And Implementation Of Building Knowledge Graph System Based On Information Extraction
4	Design And Implementation Of A Knowledge Graph System In The Vulnerability Domain
5	Research On FAQ Question Matching Based On Domain Knowledge Graph
6	Few-Shot Knowledge Graph Completion Based On Neighborhood Information Fusion
7	Research On Information Extraction And Fusion Of Knowledge Graph For Unstructured Data
8	Research And Application Of Film Recommendation Technology Based On Knowledge Graph
9	The Design And Implementation Of Knowledge Extraction Service For Constructing The Knowledge Graph Of The Financial Domain
10	Research On Missing Information Complementation Technology For Triple Based On Knowledge Graph