Font Size: a A A

Implementation And Optimization Of Snapshot Isolation In Scalable Database System

Posted on:2019-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:B XiaoFull Text:PDF
GTID:2428330566960772Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the prosperous development of the Internet and the proliferation of data-intensive applications,a growing number of Internet applications employ scalable database systems to meet the increasing demand of data storage and access.Since the separation of in-cremental data and baseline data can achieve both large-volume storage capacity of disk and high-performance updates of memory,the log-structured merge-tree(LSM-tree)style storage has been widely adopted in distributed data storage systems(e.g.,BigTable and HBase).However,this kind of storage systems usually do not support database transac-tion characteristics,so they cannot be directly applied to mission-critical applications like those of finance and telecom.Google's Spanner provides ACID properties of transaction processing through distributed concurrency control based on BigTable storage.Never-theless,due to the system performance limitation from data partition,Spanner has a high requirement for upper-layer applications.Along with the declining memory prices and the increasing of processing units in a single processor,adopting a single independent physi-cal server node to process incremental data avoiding complicated distributed transaction has become a good choice(e.g.,OceanBase,CEDAR).However,this kind of system architecture could not efficiently serve transactional read operations.First,in order to guarantee the correctness of transaction processing,ev-ery transaction read needs to fetch both baseline data and incremental data from different storage nodes,which increases the processing time and conflict possibilities of transac-tions.Second,the single transaction processing node could become a system bottleneck.This is because every transaction read needs to access incremental data.Against to the above two problems,first of all,in order to play the advantage of multi-version incremental data storage,this dissertation adopts snapshot isolation level to ensure the correctness of transaction processing making reads and writes not to block each other and improving the performance.Second,in the view of read-intensive applications,this dissertation provides an efficient data accessing method,which employs bloom filter maintaining incremental data.In this method,empty reads during data accessing in trans-action processing node can be recognized in advance,which could improve the overall throughput performance by weakening read latency caused by overload of single node.Main contributions of this dissertation are as follows:1.We give a basic implementation of snapshot isolation in a scalable database system model based on LSM-tree,analyze the regular merge processing and figure out the issues that snapshot isolation needs to address and offer the solutions.We also dis-cuss the eff-icient failure recovery and replica management strategy on the premise of snapshot isolation correctness.2.We analyze the data accessing execution in transaction processing node,including the working mechanism inside network I/O threads and worker threads.Considering the data distribution in most application scenarios,we offer an improving method of this mechanism through optimizing empty reads to elevate the snapshot read per-formance.3.We implement the snapshot isolation and the optimizing method of snapshot read in an open-source system CEDAR.A suff-icient number of experiments are con-ducted to evaluate the performance of snapshot isolation and the improvement from optimizing method based on empty read filtering.Experimental results indicate that the proposed implementation method of snapshot isolation and optimizing method of snapshot read in this dissertation can guarantee the system to provide excellent scalable throughput performance and availability.This work is a beneficial attempt to achieve high-performance transactional data access in scalable database system based on LSM-tree possessing a certain reference significance.
Keywords/Search Tags:Scalable Database System, LSM, Snapshot Isolation, Network I/O, Bloom Filter, Read Performance
PDF Full Text Request
Related items