Research On Methods Of Entity Resolution In Dataspaces

Posted on:2020-05-25

Degree:Master

Type:Thesis

Country:China

Candidate:C Su

Full Text:PDF

GTID:2428330575461954

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Entity resolution technology is widely used in database management,data integration and information retrieval to identify two different records pointing to the same entity.Traditional entity resolution methods are mainly used for data sets with semantic mapping.However,with the advent of the information age,entity information is distributed in various data sources,and the semantic information of each data source is difficult to be unified.When the user collects the information,since the information may be repeated,it will waste the storage space if stored directly in the data space,and waste the time and hardware resources when processing.The data will be deduplicated through entity resolution to achieve the function of data cleaning.In order to adapt to the multi-source heterogeneous data environment such as dataspace,this paper proposes a record partitioning method based on record graph.By using data preprocessing,the possible matching data are placed in one block,and the accurate matching operation of records is only carried out in the block,so as to improve the computational efficiency.By calculating the weighted sum of label similarity and relational similarity between records,a record graph is constructed,with data record as the node of the graph and similarity as the weighted edge.Due to the low similarity of many pairs of records,it is difficult to match them.According to the characteristics and application requirements of the data set,the appropriate pruning method is selected to trim the record graph to reduce the relative redundancy of entity resolution,and the record graph is divided into blocks.In this paper,the real data set is used to evaluate the method,and the experimental results are analyzed.Because of the duplication of the attribute values of the heteronymy attributes in the block,the attribute mapping cluster is obtained by using the attribute values to map the attributes of the data in the block.Since a higher weight assigned to high-quality attributes is helpful to improve the accuracy of entity resolution,the weight of mapping set is distributed by calculating the goodness of attribute mapping set,and the weighted sum of similarity of mapping attributes is calculated and matched with the preset threshold.In the process of calculating attribute similarity of record pair,the edit distance of attribute value is calculated by expression method,and the information of matched record pair is integrated.The principle of integration is to merge the common information of record pairs and retain the characteristic information to feedback the most comprehensive entity content to users.In this paper,the real data set is used to evaluate the method,and the experimental results are analyzed,indicating that the method has a good adaptability to the data space environment.

Keywords/Search Tags:

Entity Resolution, Dataspace, Tag-style Blocking, Property mapping, Information Merging

PDF Full Text Request

Related items

1	Research On Key Techniques Of Entity Resolution For Big Data Integration
2	Research On Named Entity Intergration In Dataspace
3	Automated Comparative Table Generation For Facilitating Human Intervention In Multi-Entity Resolution
4	Research On Hybrid Human-Machine Based Entity Resolution Methods
5	Research On Keyword Query In Personal Dataspace Management System
6	Research On Key Technologies Of Integration And Query In Dataspace
7	Research On Entity Resolution For Heterogeneous Big Data Integration
8	Design and construction of an entity resolution system that supports entity identity information management and asserted resolution
9	Study On Entity Resolution Based On Semantics In Data Integration
10	Research On The Relevance Of Data In DataSpace