Font Size: a A A

Research And Improvement Of CRUSH Algorithm In Ceph Distributed Storage System

Posted on:2017-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y L MuFull Text:PDF
GTID:2348330485484003Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
To accommodate big data scenarios shift from computing center to data center, distributed storage system, as one of the underlying facilities, faces many challenges, including the amount of data growth, diversity of data types, and higher performance requirements, etc. In response to these changes, the distributed storage technology needs to solve three major technical difficulties as the primary: data consistency, reliability, and load balancing. Data distribution problems directly affect the load balancing of cluster, a good data distribution algorithm for distributed storage system is particularly important.CRUSH algorithm is a data distribution algorithm in Ceph distributed storage system, based on the actual physical structure. The CRUSH algorithm builds a multi-level mapping table, could return a set of storage nodes to store a data object suitable. CRUSH algorithm use consistent hashing algorithm to generate a pseudo-random number, and combined with node weights as weighted calculation. Thus, CRUSH algorithm has considered the allocation of storage location, and in most cases, it will ensure sufficiently uniform distribution of the data, without affecting the load balancing of cluster. Data object size, cluster scale and deployment plan constraint with other factors. In practical application, if it stores a single data object which is too small, it will not affect the node weight changes, and it will cause a group of highly relevant data into the same set of storage node. Access request for this set of data will have a load balancing problem that some nodes usage is too high, and most nodes of the cluster are idle, and the performance of the whole cluster will decline fast; On the other hand, if the size of the storage cluster to make changes, such as expansion, delete the historical data, the backup node failure caused by right cluster node weight differences can also cause that small objects will fall into the same node, the CRUSH algorithm can also cause the problem of uneven distribution of data, thereby affecting the performance of the cluster.This study found that the above problem is caused by the CRUSH, distribution of the data algorithm. CRUSH algorithm does not take full account of the diversity and specificity of the cluster scales. Thus, this thesis improves CRUSH algorithm by three aspects, and sums up the main content as follows:(1) To research Ceph distributed storage system, compared to traditional distributed storage architecture, introduce the innovation of Ceph and the principle of realization. By simulating large-scale cluster of Ceph distributed storage test, to verify the current Ceph distributed storage system which does exist unevenly distributed defects.(2) To improve CRUSH algorithm for data distribution, increase the temperature factor for constraints of small data objects. When CRUSH algorithm selection result, in addition to high-priority weight value of the node, but also to increase the temperature value which changes at each access node, when continuously accessing the same node, temperature value at a faster pace with the weight value contrary, CRUSH algorithm should give priority to low-temperature value of the node.(3) The improved CRUSH algorithm use modulo-like manner to utilize the distributed Ceph storage system, increase the flexibility of deploying Ceph and choose whether you need to optimize the actual situation, adjust by parameters without having to reboot the entire cluster and backup.Based on the above research, this thesis establishes the experimental environment, builds the Ceph distributed storage cluster, and simulates a variety of application scenarios to test the cluster. At last, this thesis uses static analysis with rados bench to analyze data distribution; then dynamic record the disk usage by iostat on each storage node; and use iozone tool with different parameters to read and write the entire cluster for testing the cluster throughput by comparing between the conventional manner and improved CRUSH algorithm, verify the algorithm feasibility and extent of the improved optimization. Experimental results show that the improved CRUSH algorithm can solve the load balancing problem caused by small data storage.
Keywords/Search Tags:Ceph distributed storage, CRUSH algorithm, consistent hashing, temperature factor, load balancing
PDF Full Text Request
Related items