Chinese Entity Relation Extraction Based On BERT And Knowledge Verification

Posted on:2021-12-06

Degree:Master

Type:Thesis

Country:China

Candidate:Z C Wang

Full Text:PDF

GTID:2518306302454254

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

Entity-relation extraction is to let machines automatically extract knowledge such as entities,relationships from natural language texts.It aims to equip machines with the ability to automatically construct knowledge graphs from massive texts.Recently,pre-trained language models have shown great success in natural language processing tasks.We propose to use Bidirectional Encoder Representations from Transformer(BERT)as encoder to handle multiple relations and entities Chinese schema-based information extraction problem.Firstly,we propose a first relations-classification then entities-labeling hierarchical pipeline model with two stages.In the first stage,we use BERT for multilabel relation classification problem.In the second stage,we concatenate the predicted relation tokens as prior information with raw text and feed it to BERT for named entity recognition,we predict the positions of entities with sequence-labeling method,then combine the relations with entities for SPO triples.Then,we propose a joint and end-to-end model which performs entity extraction and relation extraction simultaneously.Multi-head selection model uses BERT as lower layer,modeling the named entity recognition task with a CRF(Conditional Random Fields)layer and the relation extraction with multi-head selection.Multi-head selection predicts the most probable vector of heads and corresponding relations for each token.The experimental results demonstrate the effectiveness of models and a clear improvement on F1 measurement over baseline model up to 10 percentage with knowledge distillation and model integration.Finally,we create a triple trustworthiness measurement with knowledge graph resources.For common information extraction problem,we use the information of raw texts only,ignoring the prior information in triple knowledge graph,like the local topology structure,statistical distribution of entities types.In the first stage,we acquire new triples not existing in prediction set but in knowledge resources with distant supervision.Secondly,we use triple classification to select high confidence level triples.Triple classification aims to judge whether a given triple is correct or not as a binary classification problem.XGBoost model extracts effective features like the statistical conditional distribution of entities-relations in training knowledge resources called SDValidate,triple score based on knowledge embedding like Trans E,rank of neural model output confidence score and so on.Our method improves performance of neural extraction model by 2-4%.We also use SHAP values for feature importance.In summary,our work proposes two BERT pretrained-based models to handle multiple relations and entities extraction problem,with a measurement part for collecting and filtering triples.Experimental comparisons show the effectiveness and universality of our framework,which also promisingly dealing with tasks like question answering,recommendation system.

Keywords/Search Tags:

relation extraction, pre-trained language model, named entity recognition, distant supervision, triple classification

PDF Full Text Request

Related items

1	Research On Distant Supervision Relation Extraction Technology With Pre-trained Language Models
2	Research On Chinese Entity Relation Extraction Based On Schemas And Pre-trained Language Models
3	Research Of Joint Extraction Of Entities And Relations Based On Pre-trained Model
4	Research On Entity Relation Extraction Method Based On Distant Supervision
5	A Chinese Entity Relation Extraction Method Based On Distant Supervision
6	Domain Adaptation Research And Application Of Named Entity Recognition
7	Research And Implementation Of Chinese Entity Relation Extraction Based On Deep Neural Network
8	Research On Distant Supervision Relation Extraction With Attention Enhanced Bag Representation
9	Research On Key Technologies For Entity Relation Extraction
10	Learning for information extraction: From named entity recognition and disambiguation to relation extraction