Font Size: a A A

Improving The Dependability Of Cloud Computing Systems

Posted on:2009-05-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:H B ChenFull Text:PDF
GTID:1118360272989293Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Cloud computing is an appealing innovation of modern computing mode. By integrating networked computing resources and virtualizing them through different levels of abstractions,cloud computing provides users massive computing resources using a common interface.Further,cloud computing hides the complexity in deployment and management of hardware resources,software stack and networking protocols from users.Being aware of its importance,both researchers and industry have put significant efforts in cloud computing.One of the indispensable requirement of cloud computing systems is the dependability of the infrastructure as well as the running services.Specifically,a cloud computing system should satisfy the following criteria:(1) security,which requires the cloud system can protect both the computing systems and the running services;(2) maintainability,which requires that the system be easy to maintain, thus mitigate the impact of inevitable hardware and software failures and deployment of new features to suit business need;(3) availability,which demands the system be constantly operable and be providing correct services,even in the present of possible hardware and service failures.(4) trustworthiness,that the cloud system should provide trustworthy services that users's code and data,which may contain business secrets,will not be improperly divulged and abused.Dependability has always been the major concern of computing systems and been the focus of both researchers and industry.The emerging cloud computing have put even more challenges on it due to the scale of a cloud computing system. A larger scale means more complexity in management and more probability to fail,thus less MTTF(Mean Time To Fail).Generally,previous research efforts can be categorized into three levels:(1) computer architecture level,which investigates solutions in enhancing the existing processors,memory and I/O systems to improving the dependability.For example,the XOM system,which extends the instruction set architecture,bus and registers to enhance the trustworthiness of the mission-critcial applications running on commodity operating systems.(2) operating system level,by enhancing the dependability of subsystems of existing operating systems,or even implementing new operating systems.For example, Nooks improves the dependability of driver subsystems in Linux while asbestos is an OS built from scratch to improve the security of applications.(3) applica- tion level,which aims to utilize advances in language,compiler,binary translator technologies to improve the security and availability of applications.For example, Ginseng is a dynamic update system by extending existing compiler while LIFT utilizes a binary translator to do taint tracking to defeat possible software attacks.However,there are still several problems in existing researches in dependability. Most currently research usually only focus on one aspect,while neglects or sacrificing other aspects:some architecture level solutions require non-trivial changes to the processor architecture,memory,and bus,which are not easy to be quickly commercially available,examples include the XOM trust system;some operating system solutions requires a new re-constructions of the existing software stack,breaking backward compatibility.For example,the Singularity and Asbestos provides completely new operating system abstractions,making existing application hard to benefit from them;some existing security systems bring prohibitive performance overhead,preventing their uses in production runs.For example,the TaintCheck system incurs up to 36X performance degradation while LIFT,currently the system with best performance for taint tracking systems, brings about 3.6X performance slowdown.Based on a detailed analysis on the requirement of the dependability in currently cloud computing system,this dissertation proposes a practical and systematic solution to improve various aspects of currently cloud computing system from different levels,while not sacrificing the performance and not mandating design changes to existing architecture,OS abstractions and applications.Specifically,the proposed solution is composed of the following key techniques and systems that solve different problems in different levels:1.Practical and efficient security enhancement by combining speculative execution and dynamic information flow tracking.Design and implementation of the SHIFT and BOSH secure systems based the idea.The SHIFT system leverages existing hardware support for deferred exception handling to implement a practical,efficient taint tracking systems.The SHIFT system is with be best performance among real-world taint tracking systems, with only 1%performance overhead to server applications,and about 1.27X performance overhead for SPECINT-2000.Based on the idea and design of SHIFT,the BOSH system further leverages the hardware support for taint tracking to support a low-overhead binary obfuscation scheme.BOSH can obfuscate the whole control and data flow of a program to defeat attacks that alter the control and data flow,as well as protecting software copyright,with only 27%performance overhead.2.The idea of a bi-directional write-through based synchronization scheme for dynamic updating operating system kernel and multi-threaded software,and the LUCOS and POLUS dynamic updating systems that embody the idea. LUCOS is the first system that uses virtual machine monitors to dynamically update the operating systems running on,with less than 1%performance slowdown.It is the first system that support updating Linux with changes to the data structure,without modifying the the Linux kernel.POLUS is the first system that support online switches of multi-threaded applications among different versions,both backwards and forwards,with less than 5%performance degradation.3.The Mercury on-demand virtualization system,that improve the availability of the cloud by tolerating possible hardware failures.Mercury supports dynamically inserting a virtual machine monitor beneath a running operating system and the uses the VMM to migrate the whole operating system environments to other node upon a machine failures.4.The Talos trust system infrastructure that provides behavior conformity to both the cloud computing platform and cloud application.Two systems implements Talos behavior conformity:CHAOS utilizes a VMM to protect the application running in a commodity(and untrusted) operating system, to prevent the code and data in a cloud application from being divulged and abused;Shepherd process shepherding system that prevents cloud services from attacking the cloud platform.The performance overhead in CHAOS and shepherd are also modest.
Keywords/Search Tags:Cloud computing, Dependability, Software Security, Availability, Dynamic Update
PDF Full Text Request
Related items