Font Size: a A A

Research On Replica Placement And Selection Strategies In Heterogeneous Cluster Storage System For Big Data

Posted on:2016-07-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:R Q XiongFull Text:PDF
GTID:1108330503477247Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the new era of big data, as a next generation of enterprise data storage architecture, cluster storage system has been extensively utilized in production to solve the problems like the storage capacity limitation, I/O performance bottleneck and storage cost. Although this is a feasible and scalable solution, how to make the system fault-tolerant is a critical issue since the cluster storage systems consist of thousands of commodity and unstable hardware. For this purpose, the replication techniques are widely applied to guarantee the high reliability and availability of the system. With the popularity of the diversified big data applications, it is urgent to improve the performance of the heterogeneous cluster storage systems, especially the effectiveness of replica placement and selection in heterogeneous environments.Although there have been many studies that focus on this problem, the problem is still challenging and hard to be tackled. First of all, the static placement strategies can be easily implemented, but their effectiveness has not been proved completely and without considering the heterogeneity of cluster storage devices that will lead to a limited adoption; while the dynamic placement strategies perform well in balancing the I/O workload, they are much more complicated than the static strategies and have low efficiency. Additionally, the existing dynamic placement strategies mainly concern how to balance the workload, but they are not energy-efficient or without considering the heterogeneity of hardware either, and suffer from several practical limitations also. Finally, the existing replica selection strategies have some deficiencies such as poor scalability and are not application-oriented, whose focuses are not on optimizing the performance of specific big data applications with individual QoS sensitivity constraints.This thesis focuses on the big-data-oriented optimization techniques that are designed for the replica management in large-scale heterogeneous cluster storage systems. Specifically, novel strategies and algorithms are designed, which help the industries construct a high-reliability and high-availability cluster storage systems that have massive capacity, low cost and high scalability. The thesis covers the following four topics:1) the static replica placement strategies are systematically studies base on the queuing theory, and the relevant results of static placement can provide a basis of how to determine the design of a cluster storage system for enterprise data centers; 2) a dynamic replica placement strategy is designed for big-data applications that running on the Hadoop cluster. The proposed strategy not only improves the performance of applications, but also is energy-efficient and space-saving; 3) a QoS-preference aware replica selection algorithm is put forward to deal with the individual QoS sensitivity constraints, which guarantees the QoS requirements of big-data applications by solving a replica selection problem that considers multi-dimension QoS constraints; 4) a multi-tier cluster storage data management system, named SEU-Storm, is designed and implemented for storing massive AMS-02 experiment data. It can improve the efficiency of AMS-02 application by providing optimized replication management strategies.This thesis also evaluates the proposed strategies and algorithms through a series of simulations and some practical AMS-02 experiment’s productions. The evaluation results show that the proposed strategies and algorithms can efficiently improve the performance of large scale heterogeneous cluster storage systems. Meanwhile, our studies provide a new solution for the challenges in the real-life large scale cluster storage systems.
Keywords/Search Tags:Cluster Storage, Big Data, Heterogeneity, Cloud Computing, Replica Placement, Replica Selection
PDF Full Text Request
Related items