Font Size: a A A

Research On Data Placement For Distributed Storage Systems With Heterogeneous Resources

Posted on:2019-08-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:W ZhouFull Text:PDF
GTID:1368330548455217Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Distributed Storage system with heterogeneous resources satisfies the requirements of real-time processing and large capacity put forward by big data.As the access path and performance of data mostly depend on where it resides on,the placement of data is critical for improving device efficiency and storage performance.Yet data placement under het-erogeneous environment faces challenges on resource complexity and data variation.Ex-isting works either lack consideration on the asymmetric features inside devices,or fail to efficiently utilize all resources in balance.Ongoing of better resource utilization and per-formance,this paper makes research on high-quality data placement policies to efficiently place data on devices which well fit their access characteristics.Proposes a Preference Aware Data Placement(PADP)policy to place data among het-erogeneous devices by efficiently exploring the asymmetric read/write performance of de-vice and the asymmetric read/write characteristics of data.PADP considers that data prefer to reside on device which provides higher storage performance,and defines preference de-gree to quantitatively weight the storage performance imbalance when data are distributed on different devices.A preference model is proposed to calculate the preference degree of data according to read/write access frequency,and then data are distributed on device with high preference degree to obtain high storage performance.Besides,small files with high access frequency are given priority to distribute on emerging storage devices such as Flash SSDs,and the acceleration on small files may improve the efficiency of the whole system.According to experiment result,PADP gain 30%performance improvement versus widely used HDFS.Proposes a CrossFire based Storage Acceleration(CFSA)policy to maximize resource utilization and storage performance by intelligently hand out data to each device.With the fast technical improvement,Hybrid storage systems consisting of several types of SSDs will be adopted gradually.Existing works mostly concentrate on thoroughly utilizing high-performance device but neglect the capability of low-performance device.CFSA conveys a device crossfire method to boost hybrid storage performance by efficiently leveraging both high-performance and low-performance devices.Performance-critical data are appropriately off-loaded to low-performance device to exploit access parallelism.The bottle neck prob-lem of high-performance device is alleviated and the overall storage performance can be improved.CFSA exhibits good performance during experiments.Compared to classic solu-tion which puts all critical data on high-performance device,CFSA improves the throughput by 42.6%,and reduces the latency by 35.0%.Proposes a Load Aware Data Migration(LADM)policy to effectively balance the load among nodes and thus improve performance.Designs a unified evaluating mechanism to fairly build the linkages between resource utilization and performance impact.Apart from existing works neglecting the different influence of data,LADM divides load into network load,device I/O load and capacity load,and also classifies data into three classes,which are hot-data,warm-data and cold data.Then the relationships between different data and different loads are evaluated carefully.The profits of migrating different data to different target can also be obtained.LADM maximize the migration efficiency by select migration solution with most profits.According to experiments,LADM gains 25%improvement on throughput by data migration,and 11%improvement versus classic policy based on hot-spot migration.
Keywords/Search Tags:Distributed storage, Heterogeneous resource, Data placement, Storage acceleration, Data migration
PDF Full Text Request
Related items