Entity Resolution With Deep Learning

Posted on:2022-05-18

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Nie

Full Text:PDF

GTID:2518306524480334

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Data integration is a key task in the field of information retrieval.Among them,entity resolution(ER)task is a key step of data integration,also known as entity matching or duplicate record detection.ER aims to find the data records refer to the same real-world entity in the data from different sources.Early research are dedicated to devising various string-based distance functions.Ob-viously,such unsupervised approaches lack effectiveness and generality,since here does not exist a single metric for all datasets.It is necessary to set thresholds for different datasets manually,which is lack of generality.With the availability of of crowd workers,an alternative research branch is to leverage human intervention in the loop.However,such hybrid human-assisted approaches are not scalable due to the financial budget con-straint.In recent years,the research mainly focuses on the machine learning base algo-rithms.These approaches view the ER problem as a binary classification task and apply traditional classifiers on the hand-crafted features.They can improve ER accuracy to a certain extent,but the dependency on manual feature engineering still hinders generality and robustness.Recently,with the popularity of deep learning,some work improve the performance of ER by devising effective end-to-end deep learning models.Since existing models simply adopt vanilla RNNs to model sequential information,the model architectures are rather simple.Previous studies failed to capture the saliency of words effectively,and failed to identify the importance of different attributes for structured ER,and did not use the recently popular pre-trained language model,so there still plenty of room for accuracy improvement.In this paper,we propose a multi-context attention mechanism(MCA)to fully exploit the semantic context and capture the highly discriminative terms.Firstly,self-attention is proposed to learn dependencies between words in a sentence.Secondly,pair-attention analyzes both input sequences jointly while learning a similarity representation.Thirdly,global-attention is used to assign high weight to discriminative terms.To support struc-tured datasets with multiple attributes,we further propose attribute attention to distinguish important attributes.We conduct extensive experiments with 7 benchmark datasets that are publicly accessible.The experimental results clearly establish our superiority over pre-vious studies.Besides,with the popularity of pre-trained language models(PLM),we try pre-trained language models on ER.In 6 textual datasets,the model with PLM is superior to MCA,which further improves the performance and generalization.Finally,based on the current research status,we discuss the challenges and opportu-nities for further research.

Keywords/Search Tags:

Entity Resolution, Deep Learning, Attention Mechanism, Natural Language Processing

PDF Full Text Request

Related items

1	Research On Tibetan Named Entity Recognition Based On Deep Learning
2	Research On Entity Relation Extraction Technology Based On Deep Learning
3	Research On Chinese Named Entity Recognition Based On Deep Learning
4	Research On Single-fact Knowledge Base Question Answering Based On Multi-aspect Attention Mechanism
5	Research And Implementation Of Text-oriented Entity Relation Extraction Technology
6	Research On Sequence Labeling Model Of Natural Language Processing Based On Deep Learning
7	Question Answering Model Based On Self-Attention Mechanism
8	Financial Market Trend Forecast Based On Deep Learning And Natural Language Processing
9	Research On Entity Resolution Method Based On Multi-attribute Attention Mechanism
10	Reading Comprehension Model Based On Two-way Attention Mechanism And Conditional Random Field