Design And Implementation Of Data Flow For Database Replication

Posted on:2016-11-08

Degree:Master

Type:Thesis

Country:China

Candidate:J Luo

Full Text:PDF

GTID:2348330503494049

Subject:Software engineering

Abstract/Summary:

Near Real Time Database Replication is an important component of enterprise database management system. There is a big and long-lasting impact from data explosion age. According to Mc Kensey’s estimation, total data volume in enterprise will increase by 44% every year. It also estimates that the total data volume stored in all enterprise information system will be increased by 44 times. In order to digest use information in such as a large volume of data, enterprise needs to utilize proper tooling and architecture. However, according to one of a survey by Information Week Reports in 2012, more than 56% enterprise does not treat “big data” differently. More than 90% of enterprises are still depending on traditional RDBMS to store and process data. This means, in the coming years, RDBMS will still play an important role in data management. So, database replication system needs to carry larger and larger data volumes between database systems.The major contributions of this paper are:1. Design and implement a task-based pipeline for better database replication performance. A task is a pure computing unit that can be executed by a thread in a thread pool. Task-based programming model splits the computing unit and executing thread so that each of them can be controller separately. Programmer can add a new task without considering in which thread the task will be running, system administrator can tune the size of thread pool to balance between utilization of CPU resources and performance of replication. 2. Propose a way to improve replication performance by analysis inter-transaction dependencies so that transactions that are not inter-related can be replicated concurrently into target database system. It makes better use of concurrent execution capability of database system. 3. Design a method to compute “net-data-change” to reduce the number of data changes needed to be applied to target database. This helps improve replication performance in some situations. 4. Combine smaller transactions into larger transactions. Conducted an experiment to show that the commit rate of database system is limited by the IOPS of disk sub-system. Propose a simple algorithm to automatically tuning the size of commit transaction to approach the IOPS of disk sub-system of target database system. 5. Propose to use Prepared/Dynamic SQL statement API and BULK LOAD API. These APIs can help improve replication performance latency. 6. Propose a way to manage table metadata and replication states change history. With it, database change replay becomes easy to implement.

Keywords/Search Tags:

database replication, replication performance, replication performance test, data flow architecture

Related items

1	Dm Embedded Database Data Replication System Design And Implementation
2	Research And Implementation Of Real-time Data Replication System Based On DB2
3	Research On Dynamic Data Replication In Dameng Database System
4	Application Of The Database Replication Technology In Automatic Monitoring System
5	Research On Heterogeneous Databased Replication Technique In Based On Multi-platform
6	Implementation Of Multi-master Replication Database Extension Based On MySQL Replication
7	Based On The Flow Replication Of B4c E-commerce System Design
8	Research On Adaptive Web Documents Replication Mechanism
9	Research On And Implementation Of An Open Framework For Heterogeneous Database Replication
10	The Multi-tenant Replication Data Resources Management Mechanism In Saas