Font Size: a A A

Research And Implementation Of Distributed Heterogeneous Data Source Synchronization Framework

Posted on:2019-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z WangFull Text:PDF
GTID:2348330566964279Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Internet technology and the popularization of intelligent terminal devices,the data generated on the network are constantly increasing.Computer storage technology is rapid developing in the era of big data.As for hardware,the data storage media are continually improved and renewed;As for software,a lot of novel data storage systems based on some new ideas of design were proposed.And data's value is constantly highlighted in the data-driven times.However,in the process of development,our requirements and applications of data are often changed,the initial design of data products' architecture is not perfect.For example,the data architecture of the product is designed to store the user interaction data,but with the increase of data and the business,these data will be used for visualization or decision analysis in some business.Therefore,the original data architecture of the product is not enough to fully meet the new requirements,and a new storage system may need to be exploited to improve it.In this condition,requirements for data synchronization or data migration gradually appears.At present,some existing data synchronization tools only serve a few specific data storage systems,they do not support other heterogeneous data sources.What's more,some other tools can't provide distributed running function,which leads to a problem that single machine could be up to performance bottleneck in the scene of massive data synchronization.This paper starts with the requirement of data synchronization,after deeply studying and analyzing these application scenarios,we propose a technical solution to data synchronization between heterogeneous data sources,this solution is inspired from the excellent design ideas of the existing tools,adopts distributed and service-oriented approaches,it could deal with performance bottleneck of single machine.In addition,it also supports real-time and timing data synchronization.Based on the technical solutions,we implement a data synchronization framework runs on a cluster to provide service,the cluster is composed of multiple computer nodes.Once the service is started,it will run all the time and waiting for called.Users could use the SDK provided by the framework to develop job for specific synchronization tasks,and then submit job to the service cluster.The cluster accepts the job,after complex preprocessing operations such as parsing and verification,data synchronization task will be running.Finally,we used the framework to make tests on different data sets.Through the analysis of these collected data and the framework,we explain the problems occurred in the process of testing,and point out the direction of framework further optimization.
Keywords/Search Tags:Heterogeneous Data Source, Data Synchronization, Distributed, Big Data
PDF Full Text Request
Related items