Font Size: a A A

Design And Implementation Of Data Synchronization System For Heterogeneous Database Under Large Amount Of Data

Posted on:2021-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q M XianFull Text:PDF
GTID:2428330611464977Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
The arrival of the information age,let us have a more convenient life,at the same time,also brought a lot of data.As an effective means of data management,database can help us to do some analysis,more than storing data.With the rapid growth of data volume,there are many new challenges to the database due to the different usage of data.After continuous development,various databases emerge,with their own concerns and advantages.When managing and analyzing data,we are no longer limited to using a single database product.We prefer to choose the most appropriate solution according to the needs.Therefore,in the actual production,we often use multiple databases at the same time.Data flows between heterogeneous databases,needs to use data synchronization technology.With data synchronization technology,data can be extracted from the database,then loaded into different databases after transformation.While ensuring data consistency,data synchronization technology also needs to ensure the timeliness of data and the reusability of a piece of data.This paper analyzes the advantages and characteristics of some common database products,and the existing data synchronization products and projects.Being based on it,this paper discusses a solution to synchronize data and reuse data between two or more heterogeneous databases.This solution extracts incremental data from log file of database,then distributes the incremental data to the cache queue,and finally realizes the replication of incremental data in one or more target database to complete data synchronization and data reuse.In this process,this paper puts forward several methods to improve the synchronization efficiency: the Balanced Distribution of Data Algorithm,the Operation Merging Scheme and the Window Cache Scheme.Based on the amount of data in table,the Balanced Distribution of Data Algorithm distributes the incremental data of each table,which makes the incremental data distributed to each partition of the queue more balanced and improves the throughput of the queue.When the business requirements change,the algorithm can reduce the impact on the existing table distribution rules as much as possible.The Operation Merging Scheme reduces the actual operation number by merging the operations with the same primary key value,so as to improve the data synchronization efficiency.The Window Cache Scheme dynamically adjusts the synchronization speed according to the busy degree of the target database,so as to reduce the possibility of the target database crash.In addition,based on the syntax tree of SQL statements,this paper proposes a method of DDL automatic processing to make data synchronization more intelligent.Finally,this paper verifies the feasibility of the scheme through test experiments,and obtains the performance indicators related to data synchronization.
Keywords/Search Tags:database, data synchronization, streaming data, log parsing, data reuse
PDF Full Text Request
Related items