Font Size: a A A

Research On Entity Resolution Method Of Industrial Internet Of Things Data

Posted on:2021-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:H ZengFull Text:PDF
GTID:2428330611453115Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Inferior data processing in IoT has always been one of the research hotspots.The entity resolution method for entity-description conflict on duplicated data is more widely concerned.Entity resolution refers to the discovery of different data describing the same entity in the real world from the data set.Due to the fluctuation and real-time nature of the Industrial Internet of Things(IIo T)data,the existing entity resolution methods cannot achieve accurate and efficient entity resolution for the IIoT data.Therefore,for the IIoT data,this thesis proposes a progressive entity resolution algorithm for historical data.And based on this algorithm,this thesis proposes an incremental entity resolution algorithm for real-time data.The main contributions of this thesis are as follows.In order to improve the accuracy of entity resolution for the IIoT data,first of all,because of the problem that the data types of the attributes in the IIoT data are not completely consistent,a data type insensitive attribute judgment method is proposed using hash coding.Based on the attribute judgment method,this thesis proposes a series of entity match conditions,and constructs an entity match rule.The match rule utilizes the uniqueness of hash encoding to achieve high-precision matching.Then,for the fluctuation nature of the IIo T data,based on the above-mentioned entity match rule and the idea of Merkle-tree,a progressive entity resolution algorithm for historical data is proposed.The algorithm first proposes a data standardization method in order to eliminate the impact of data fluctuation on the accuracy of entity resolution.Then,in view of the massiveness of the IIoT data,in order to ensure the efficiency of entity resolution,this thesis deforms the Merkle-tree structure.The algorithm uses the transformed Merkle-tree structure to perform progressive hash coding on the attribute values of each attribute column contained in the data,which avoids a large number of unnecessary hash coding operations and comparison operations on attributes through progressive operation to ensure the recognition efficiency while improving the recognition accuracy.In order to improve the efficiency of entity resolution for the IIo T data,first of all,in view of the high real-time requirement of real-time incremental data processing,the chain structure(St-Chain)is optimized.Based on the optimized St-Chain,an incremental entity resolution algorithm for real-time data is proposed.Then,in order to improve the scalability of the IER-RT algorithm and facilitate the introduction and integration of the core idea of the algorithm in other fields,this thesis abstracts the essence of the IER-RT algorithm,optimizes and improves it,and forms a general entity match rule for real-time incremental data.This thesis also puts forward a similarity measurement formula suitable for the entity match rule.The experimental results show that the entity resolution method for IIo T data proposed in this thesis has high performance in both accuracy and efficiency.
Keywords/Search Tags:IIoT, entity-description conflict on duplicated data, Merkle-tree, incremental entity resolution
PDF Full Text Request
Related items