Font Size: a A A

High Performance Transaction Processing On Distributed Storage

Posted on:2019-05-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:T ZhuFull Text:PDF
GTID:1368330563955275Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Relational database system was developed in 1970s.And one of its most important feature is the transaction processing,which greatly simplifies the complexity of the ap-plication development.Hence,transactional database system has been widely adopted by many areas,such as banking,stocking and online shopping.In recent years,the fast de-velopment of Internet contributes to the expansion of applications,and requires back-end database systems to support high-throughput transaction processing and massive data stor-age.However,the performance of disk-based DBMSs is largely limited by the slow disk I/O.They are hard to meet the increasing performance requirements of new applications.Benefited from the software and hardware development,many new database systems are designed to satisfy the data management requirements from Internet businesses.From the hardware perspective,the memory and multi-core CPU have gained great develop-ment,which contributes to the implementation of in-memory database system.Compared with the disk-based one,in-memory DBMSs have an order of magnitude increase in trans-action throughput.From a software perspective,Google first successfully developed and deployed several large distributed database systems.Subsequently,many NoSQL systems have come into the market.They can utilize a number of inexpensive,commodity servers to provide excellent data storage and read/write capabilities.However,these systems have some non-negligible limitations.For example,in-memory systems require the database to be fit in the main memory of a single server,and applications should invoke transactions as stored procedures.NoSQL systems sacrifice some important features(e.g.relational model,SQL,and transaction)for good scalability.In order to satisfy the transaction processing and data storage requirements,this work gives Cedar,a new distributed database system combining the advantages of both in-memory transaction processing and distributed storage.It uses a two-layer distributed log-structured merge-tree(LSM-Tree)to organize all data.It decouples three modules of the system:the data storage,the transaction management,and the query processing.Over-all,a distributed storage engine manages a consistent database snapshot;an in-memory transaction engine manages the rest newly committed data;in-memory data is periodi-cally merged into the distributed storage;and lastly,an extensible distributed query layer is responsible for SQL processing,transaction logic execution,and so on.This work fo-cuses on improving the transaction performance of Cedar,and main contributions are:1.Transaction management and communication optimization on the distributed LSM-Tree.The architecture of Cedar is analogous to a two-layer distributed LSM-Tree.Firstly,a new concurrency control method is proposed to enable high-throughput transac-tion processing on Cedar.Secondly,based on the transaction management module,a data compaction algorithm is designed to merge in-memory data back to the distributed stor-age engine,without blocking or interrupting any on-going transactions.Finally,based on the characteristics of data storage,we design caching,transaction compilation,and other means to effectively reduce inter-node communications during transaction execution.2.Optimizing interactive transaction processing on the in-memory transaction engine.Existing in-memory systems assume that transactions are executed as stored pro-cedures,limiting the flexibility of applications to operate the database.Hence,we con-sider how to improve the performance of interactive transaction processing,which faces more complexities and difficulties.Since each transaction invokes multiple network in-teractions between the client and the server,it generates much CPU blocking and con-text switching overhead.We propose a coroutine-based execution model to better handle blocking resulted from network I/O or transaction conflicts.In addition,the latency of network interaction also increases the lasting time of each transaction,resulting in more conflicts.We design a lightweight,multi-core scalable lock manager to efficiently identify and schedule conflicting operations.3.Optimizing the read-only request processing on the distributed LSM-Tree.The distributed LSM-Tree is inefficient in servicing read-only requests due to its charac-teristics.We propose a precise data access method.It mainly includes a Bloom filter that can be efficiently updated and synchronized,and a lease-based synchronization mecha-nism.The former can efficiently synchronize data existence among multiple nodes,and filter most useless remote access.The latter ensures that such filtering does not weaken the consistency of the result.By reducing the number of useless inter-node communications,the algorithm not only decreases latencies of read requests,but also significantly reduce the resource used by read processing,and increases the overall system throughput.In summary,we investigate how to achieve high-throughput transaction processing based on a new distributed architecture.Firstly,realizing major drawbacks of distributed log-structured merged-tree,this work designs new concurrency control schema and data access optimizations so that data compaction and network communications do not limit throughput.Secondly,since existing in-memory transaction engine is not well designed for interactive workloads,this work designs a new execution engine and a lock manager to reduce the CPU overhead of transaction execution.Lastly,considering that application workloads also contain lots of read-only requests,we designs a precise data access algo-rithm to reduce useless network communications.Future directions include:improving the efficiency of the data compaction operation,designing scalable transaction processing schema and promoting the performance of analytic query processing.
Keywords/Search Tags:Transaction Processing, Distributed System, In-Memory Database System, Log-Structured Merge-Tree, Concurrency Control
PDF Full Text Request
Related items