Font Size: a A A

Design And Implementation Of Data Flow For Database Replication

Posted on:2016-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:J LuoFull Text:PDF
GTID:2348330503494049Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Near Real Time Database Replication is an important component of enterprise database management system. There is a big and long-lasting impact from data explosion age. According to Mc Kensey's estimation, total data volume in enterprise will increase by 44% every year. It also estimates that the total data volume stored in all enterprise information system will be increased by 44 times. In order to digest use information in such as a large volume of data, enterprise needs to utilize proper tooling and architecture. However, according to one of a survey by Information Week Reports in 2012, more than 56% enterprise does not treat “big data” differently. More than 90% of enterprises are still depending on traditional RDBMS to store and process data. This means, in the coming years, RDBMS will still play an important role in data management. So, database replication system needs to carry larger and larger data volumes between database systems.The major contributions of this paper are:1. Design and implement a task-based pipeline for better database replication performance. A task is a pure computing unit that can be executed by a thread in a thread pool. Task-based programming model splits the computing unit and executing thread so that each of them can be controller separately. Programmer can add a new task without considering in which thread the task will be running, system administrator can tune the size of thread pool to balance between utilization of CPU resources and performance of replication. 2. Propose a way to improve replication performance by analysis inter-transaction dependencies so that transactions that are not inter-related can be replicated concurrently into target database system. It makes better use of concurrent execution capability of database system. 3. Design a method to compute “net-data-change” to reduce the number of data changes needed to be applied to target database. This helps improve replication performance in some situations. 4. Combine smaller transactions into larger transactions. Conducted an experiment to show that the commit rate of database system is limited by the IOPS of disk sub-system. Propose a simple algorithm to automatically tuning the size of commit transaction to approach the IOPS of disk sub-system of target database system. 5. Propose to use Prepared/Dynamic SQL statement API and BULK LOAD API. These APIs can help improve replication performance latency. 6. Propose a way to manage table metadata and replication states change history. With it, database change replay becomes easy to implement.
Keywords/Search Tags:database replication, replication performance, replication performance test, data flow architecture
PDF Full Text Request
Related items