Research And Implementation Of Distributed Heterogeneous Data Source Synchronization Framework

Posted on:2019-02-21

Degree:Master

Type:Thesis

Country:China

Candidate:Z Wang

Full Text:PDF

GTID:2348330566964279

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of Internet technology and the popularization of intelligent terminal devices,the data generated on the network are constantly increasing.Computer storage technology is rapid developing in the era of big data.As for hardware,the data storage media are continually improved and renewed;As for software,a lot of novel data storage systems based on some new ideas of design were proposed.And data's value is constantly highlighted in the data-driven times.However,in the process of development,our requirements and applications of data are often changed,the initial design of data products' architecture is not perfect.For example,the data architecture of the product is designed to store the user interaction data,but with the increase of data and the business,these data will be used for visualization or decision analysis in some business.Therefore,the original data architecture of the product is not enough to fully meet the new requirements,and a new storage system may need to be exploited to improve it.In this condition,requirements for data synchronization or data migration gradually appears.At present,some existing data synchronization tools only serve a few specific data storage systems,they do not support other heterogeneous data sources.What's more,some other tools can't provide distributed running function,which leads to a problem that single machine could be up to performance bottleneck in the scene of massive data synchronization.This paper starts with the requirement of data synchronization,after deeply studying and analyzing these application scenarios,we propose a technical solution to data synchronization between heterogeneous data sources,this solution is inspired from the excellent design ideas of the existing tools,adopts distributed and service-oriented approaches,it could deal with performance bottleneck of single machine.In addition,it also supports real-time and timing data synchronization.Based on the technical solutions,we implement a data synchronization framework runs on a cluster to provide service,the cluster is composed of multiple computer nodes.Once the service is started,it will run all the time and waiting for called.Users could use the SDK provided by the framework to develop job for specific synchronization tasks,and then submit job to the service cluster.The cluster accepts the job,after complex preprocessing operations such as parsing and verification,data synchronization task will be running.Finally,we used the framework to make tests on different data sets.Through the analysis of these collected data and the framework,we explain the problems occurred in the process of testing,and point out the direction of framework further optimization.

Keywords/Search Tags:

Heterogeneous Data Source, Data Synchronization, Distributed, Big Data

PDF Full Text Request

Related items

1	The Design And Implementation Of One Kind Of Common Distributed Data Synchronization System
2	Research Of Distributed Integration System Of Heterogeneous Data Sources
3	Design And Implementation Of A Heterogeneous Data Source Exchange System Based On Spark
4	Synchronization Of Heterogeneous Data Sources Based On The Syncml Protocol And Implementation
5	Research On The Application Of Heterogenous Data Source Integration In University Data Center Project
6	Design And Implementation Of MES Material Management Data Integration System Based On Multi-source Heterogeneous
7	Research And Application Of Model Data For A Distributed Database Data Synchronization Which Based On SOA
8	Integrated Research And Implementation Of Internet-based Heterogeneous Data Source
9	Research And Design For Data Synchronization On Heterogeneous Distributed Database
10	Research And Implementation On The Interoperation Method Of The Multi-source And Heterogeneous Data