Font Size: a A A

Chinese Entity Relation Extraction Based On BERT And Knowledge Verification

Posted on:2021-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z C WangFull Text:PDF
GTID:2518306302454254Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Entity-relation extraction is to let machines automatically extract knowledge such as entities,relationships from natural language texts.It aims to equip machines with the ability to automatically construct knowledge graphs from massive texts.Recently,pre-trained language models have shown great success in natural language processing tasks.We propose to use Bidirectional Encoder Representations from Transformer(BERT)as encoder to handle multiple relations and entities Chinese schema-based information extraction problem.Firstly,we propose a first relations-classification then entities-labeling hierarchical pipeline model with two stages.In the first stage,we use BERT for multilabel relation classification problem.In the second stage,we concatenate the predicted relation tokens as prior information with raw text and feed it to BERT for named entity recognition,we predict the positions of entities with sequence-labeling method,then combine the relations with entities for SPO triples.Then,we propose a joint and end-to-end model which performs entity extraction and relation extraction simultaneously.Multi-head selection model uses BERT as lower layer,modeling the named entity recognition task with a CRF(Conditional Random Fields)layer and the relation extraction with multi-head selection.Multi-head selection predicts the most probable vector of heads and corresponding relations for each token.The experimental results demonstrate the effectiveness of models and a clear improvement on F1 measurement over baseline model up to 10 percentage with knowledge distillation and model integration.Finally,we create a triple trustworthiness measurement with knowledge graph resources.For common information extraction problem,we use the information of raw texts only,ignoring the prior information in triple knowledge graph,like the local topology structure,statistical distribution of entities types.In the first stage,we acquire new triples not existing in prediction set but in knowledge resources with distant supervision.Secondly,we use triple classification to select high confidence level triples.Triple classification aims to judge whether a given triple is correct or not as a binary classification problem.XGBoost model extracts effective features like the statistical conditional distribution of entities-relations in training knowledge resources called SDValidate,triple score based on knowledge embedding like Trans E,rank of neural model output confidence score and so on.Our method improves performance of neural extraction model by 2-4%.We also use SHAP values for feature importance.In summary,our work proposes two BERT pretrained-based models to handle multiple relations and entities extraction problem,with a measurement part for collecting and filtering triples.Experimental comparisons show the effectiveness and universality of our framework,which also promisingly dealing with tasks like question answering,recommendation system.
Keywords/Search Tags:relation extraction, pre-trained language model, named entity recognition, distant supervision, triple classification
PDF Full Text Request
Related items