Font Size: a A A

A Siamese Recurrent Neural Network For Entity Alignment

Posted on:2019-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y LvFull Text:PDF
GTID:2348330545477895Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently,with the rapid development of the modern information technology,data generated by human shows explosive growth.Due to the lack of unified data specifi-cations and system or human error in the process of data collection,storage,and use,there is a large amount of data inconsistencies and redundancy in these mass data.The existence of these problems makes the conclusions based on these data likely to go wrong or even contradictory.Therefore improving data quality has become a hot issue in current information science research.Entity alignment which deals with determining whether two records refer to the same entity has a wide range of applications in both data cleaning and integration.By using entity alignment,we can handle inconsistencies in data sets.Traditional ap-proaches focus on using string metric methods to calculate the matching scores of two records or employing a conventional machine learning technique with manually ex-tracted features from pairs of records.However,the effectiveness of these methods largely depends on designing good domain-specific string metrics or manually extract-ing discriminative features.Also,traditional learning-based methods often ignore con-textual semantic information in text data when constructing features.In this thesis,we study the application of a recurrent neural network to entity align-ment and propose two basic entity alignment methods which are based on siamese re-current neural networks,Word-based MaLSTM and Character-based biLSTM.Both of the two methods implement an end-to-end deep network model,which apply recurrent neural network to automatically capture contextual semantic features from data and do not need any string metrics.Considering the problem of information gaps between attribute fields in data(for example,citation data),we further propose a new entity alignment method based on joint multi-field siamese recurrent neural network,JMFS RNN.According to each attribute field's text characteristics,JMFS RNN uses differ-ent recurrent neural network cells to capture each field's features and combines all of these captured features.The special processing method can not only effectively mine the features of each attribute field,but also avoid the influence of the information gaps.We compare the three proposed methods with several traditional entity alignment methods in two public data sets,Cora and Citeseer.The experimental results show that our methods can effectively learn discriminative features and outperform other traditional methods.
Keywords/Search Tags:Data Inconsistency, Data Quality, Entity Alignment, Siamese Recurrent Neural Network
PDF Full Text Request
Related items