Font Size: a A A

The Design And Implementation Of Data Migration System For Multiple Data Sources

Posted on:2021-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:X H LiuFull Text:PDF
GTID:2518306557489664Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Data migration refers to the process of transferring data from one storage system to another,including the selection,extraction,and conversion of data.Existing migration tools are designed for the transformation for two specific kinds of databases,which often leads to insufficient data migration function,poor scalability and migration performance,thereby not being capable of meeting enterprise demands for multi-source heterogeneous data integration,backup and analysis in the era of big data.This paper implements a data migration system that can support the conversion of multiple heterogeneous data models,expand new data sources,and use Hadoop cluster parallel computing to maintain the high performance of migration in big data scenarios.The main work is summerized as follows:1.Propose a method of transforming various heterogeneous patterns through intermediate data formats.The method provides abstract and common representation for both ends of heterogeneous data models.It is different from direct data migration,which converts the source data format to the intermediate data format and then converts the intermediate data format to the target data format.Using intermediate data format as intermediary can effectively realize the transformation between various data formats,reduce the complexity of format conversion between different heterogeneous data formats,and improve the scalability of new data sources.2.Design and implement a module for data conversion,which supports the extension of multiple data sources through plug-in architecture.The module encapsulates the general migration process between different data sources as abstract classes,then packaged into an SDK for secondary development,submitted to the system as a plug-in for analysis and operation.3.Design and implement a high-performance parallel module for task execution,achieve highperformance parallel migration based on the Map Reduce framework.This module calls the data source plug-in,obtains task information,performs statistics and evenly divides it,packages it into a Map task,then submits it to the distributed cluster for parallel computing which improves the migration efficiency.4.Design and implement a module for system management and a module for user interaction,provide an API interface based on the MVC architecture that is easy for user interaction.This module implements system back-end data management,including plug-in parsing,data storage,and message transmission.It also supports users in calling RESTful APIs to execute common management functions including adding,deleting data sources,submitting data migration tasks,and viewing migration task status.After testing,the system support for a variety of heterogeneous data sources has been verified.The data source plug-ins can be added and loaded without the need for recompiling system codes,enabling the scalability of data source support.The system correctly migrates data between homogeneous or heterogeneous data sources based on user task configuration.In addition,the system adds nodes horizontally under cluster deployment,significantly improving the performance of big data migration.
Keywords/Search Tags:Heterogeneous data source, Data migration, Big data, MapReduce
PDF Full Text Request
Related items