Font Size: a A A

Research On Performance And Reliability Of Large Data Processing

Posted on:2014-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2208330434972538Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Along with the exponentially rapid growth of the data size of Internet, there comes the challenges that how to efficiently store and analyze these data. To cope with the challenges of big data, there has emerged plenty of distributed storage and computing systems, among which, the distributed NoSQL systems employ the features of distributed systems and natural horizontal scalability of NoSQL systems, become the good example of storage systems dealing with big data. They trade strong consistency for availability, which results in performance improvement; also, in such systems simply adding nodes into the cluster can bring in performance improvement straightforward.To save the network transition, distributed NoSQL provides batching operations to client. However, in today’s typical distributed NoSQL systems, the batching write operation still face some crucial problems, which accounts for a large proportion in applications. I/O burst is common in write-heavy applications, which requires both high throughput and steady performance. The lock contention in iterations of batching handling, employed to ensure the atomicity of date update, causes severe False Conflict, which downgrades the performance significantly. Besides, the network, I/O contention between daemon threads brings performance thrashing.In this paper, we provide a comprehensive solution to deal with these problems: First, we utilize two improved batching algorithms to accelerate the procedure of handling write operations in server node, improving the throughput. Second, VLog, which can parallelize write to WAL, is employed to reduce waiting time. Third, we.design an adaptive flushing model, which re-schedules the background working threads based on monitoring of the network resource accessing, to reduce the contention between threads and reduce the performance thrashing, improveing the quality of service. And have implemented our prototype EnhBase system, based on vanilla HBase, a typical distributed NoSQL system.The experiments show that, our EnhBase system has significant performance improvement, and can reduce the performance thrashing in a short period of time.
Keywords/Search Tags:Distributed system, Design, Big data, Performance optimization, NoSQL
PDF Full Text Request
Related items