Font Size: a A A

Research On Performance Isolation For Multi-Tenant Cloud Storage Systems

Posted on:2021-01-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:J H LiuFull Text:PDF
GTID:1488306107957419Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The rapid development of information technology leads to the explosive growth of data and workloads.As the cloud computing technology matures,more and more enterprises and users deploy their workloads on the cloud computing platforms.For the sake of easier management and higher resource utilization,cloud providers generally adopt resource sharing mode to achieve dynamic provisioning of IT capabilities such as hardware,software or services.Unfortunately,the resource sharing mode can introduce resource contention and performance interference.As the key component of cloud platforms,the cloud storage system is responsible for providing storage services.Therefore,it is important to study how to provide tenants with guaranteed services through the cloud storage system.However,the diversity of tenant requirements and the complexity of I/O stack in the cloud storage system result in performance isolation more challenging.In the paper,we adopt softwaredefined approaches to achieve performance isolation at three stages respectively,including distributed file system,local file system of the storage server and storage medium.At the stage of the distributed file system,we present a recovery-aware scheme for performance isolation,called CoPR.Distributed file systems will inevitably suffer failures when providing storage services.In order to ensure the reliability of tenant data,distributed file systems need to recover the corrupted data in a timely manner.However,it is challenging to guarantee tenant performance and recovery performance simultaneously due to the competition between tenant requests and recovery requests.CoPR aims to coordinate performance isolation and recovery optimization at the distributed file system level.In detail,CoPR presents a two-level scheduling approach.The first-level scheduler allocates resources between tenant requests and recovery requests through a reinforcement-learning based mechanism.When recovery is triggered,the first-level scheduler can adjust resource allocations to optimize recovery without violating tenant SLOs.The second-level scheduler adopts a software-defined approach which allocates resources among tenants at the per-block-device granularity.Specifically,the control plane documents tenant performance requirements in the metadata of virtual block devices.The data plane allocates resources among block devices according to tenant performance requirements.Our experiments on Ceph show that CoPR can achieve isolation goals and reduce recovery time by 35.8%-55.8%.At the stage of the local file system,we present a behavior-aware scheme for performance isolation,called SDFS.In cloud platforms,the virtual disks of VMs can be stored in large files on the shared and networked storage servers.Existing isolation techniques cannot deal with the implications of the file system employed by the networked storage servers,such that underlying resource usage is unpredictable(e.g.,the delayed writeback mechanism could postpone writes and the journaling mechanism could amplify writes).The lack of visibility on underlying resource usage leads to the predicament of being unable to meet isolation goals.SDFS aims to exploit the underlying file system to allocate resources at perimage-file granularity and provide tenants with guaranteed throughput.SDFS comprises two components,including the control plane and the data plane.At the control plane,we provide a set of system calls to document tenant performance requirements into the metadata of image files.In addition,the control plane disseminates the tenant performance requirements to the block layer by multiplexing the data path.At the data plane,we construct a filebased scheduler to manage memory and disk resources according to the tenant performance requirements.SDFS's design does not require modification to guest OSes,hypervisors or file server protocols.Through a prototype implementation,we demonstrate that SDFS can reduce performance interference by 76.4%with negligible overhead.At the stage of the NVMe SSD,we present an asymmetric resource allocation scheme for performance isolation,called CostPI.Due to their high throughput,low response time and low power consumption,NVMe SSDs have been wildly adopted to provide storage services in cloud platforms where diverse workloads are colocated.To achieve performance isolation,existing solutions partition the shared SSD into multiple isolated regions and assign each workload a separate region.However,these isolation solutions cannot reduce the interference caused by the embedded cache contention.In addition,they could result in inefficient resource utilization and imbalanced wear.CostPI aims to achieve a costeffective performance isolation scheme though full-isolated virtualized SSDs(FI-vSSDs)and para-isolated virtualized SSDs(PI-vSSDs).FI-vSSDs provide storage services for latencysensitive workloads and comprise dedicated resources,including data cache,mapping table cache and NAND flash.PI-vSSDs provide storage services for throughput-oriented and capacity-oriented workloads,and PI-vSSDs comprise shared resources.At the control plane,we can customize virtualized SSDs through configuration files.At the data plane,we schedule requests and allocate resources according to the configurations.Specifically,at the NVMe queue level,we present an SLO-aware arbitration mechanism.At the embedded cache level,we partition the cache asymmetrically and adopt different cache polices for different data cache partitions.At the NAND flash level,we allocate the hardware resources at the channel granularity.Our experiments show that CostPI can reduce performance interference by 44.2%-89.5%,increase resource utilization by 11.4%and reduce wear-imbalance by 52.8%for the shared NVMe SSD.
Keywords/Search Tags:Cloud Storage System, Performance Isolation, Distributed File System, Local File System, Solid-State Drive
PDF Full Text Request
Related items