Font Size: a A A

Practical Integrity Assurance for Big Data Processing Deployed over Open Cloud

Posted on:2014-02-06Degree:Ph.DType:Thesis
University:North Carolina State UniversityCandidate:Wei, WeiFull Text:PDF
GTID:2458390005991830Subject:Computer Science
Abstract/Summary:
The amount of data has been exploding in the world. The capability of processing large data sets, socalled big data, is becoming a key basis of competition, underpinning new waves of productivity growth, research innovation, preventing diseases, and combating crime. Big data requires exceptional technologies to efficiently process large amount of data within a reasonable time, which include distributed parallel data processing, distributed file systems, cloud computing platforms, and scalable storage systems. Deploying these technologies over open cloud is a cost-effective and practical solution to small businesses and researchers who need to deal with data processing tasks over large amount of data but often lack capabilities to obtain their own powerful clusters. As parties in open cloud usually comes from different domains, and are not always trusted, several security issues arise, including confidentiality, integrity and availability, for example how to transmit data securely through public network, how to verify the integrity of data received from other parties, how to know if a task is performed correctly, and how to detect malicious behavior during data processing. This thesis focuses on the discussion of providing practical integrity assurance while deploying these techniques over open cloud.;The first work targets at a distributed parallel data processing system - MapReduce. MapReduce has become increasingly popular as a powerful parallel data processing model. To deploy MapReduce as a data processing service over open systems such as service oriented architecture, cloud computing, and volunteer computing, we must provide necessary security mechanisms to protect the integrity of MapReduce data processing services. In this work, we present SecureMR, a practical service integrity assurance framework for MapReduce. SecureMR consists of five security components, which provide a set of practical security mechanisms that not only ensure MapReduce service integrity as well as to prevent replay and Denial of Service (DoS) attacks, but also preserve the simplicity, applicability and scalability of MapReduce. We have implemented a prototype of SecureMR based on Hadoop, an open source MapReduce implementation. Our analytical study and experimental results show that SecureMR can ensure data processing service integrity while imposing low performance overhead.;The second work targets at a scalable data storage system - BigTable. BigTable is a distributed storage system that is designed to manage large scale structured data. Deploying BigTable in a public cloud is an economic storage solution to small businesses and researchers who need to deal with data processing tasks over large amount of data but often lack capabilities to obtain their own powerful clusters. As one may not always trust the public cloud provider, one important security issue is to ensure the integrity of data managed by BigTable running at the cloud. In this work, we present iBigTable, an enhancement of BigTable that provides scalable data integrity assurance. We explore the practicality of different authenticated data structures for BigTable, and design a set of security protocols to efficiently and flexibly verify the integrity of data returned by BigTable. More importantly, iBigTable preserves the simplicity, applicability and scalability of BigTable, so that existing applications over BigTable can interact with iBigTable seamlessly with minimum or no change of code (depending on the mode of iBigTable). We implement a prototype of iBigTable based on HBase, an open source BigTable implementation. Our experimental results show that iBigTable imposes reasonable performance overhead while providing high integrity assurance.;The third work targets at the integrity issue of outsourced databases. Database outsourcing has become increasingly popular as a cost-effective solution to provide database services to clients. Previous work proposed different approaches to ensure data integrity, one of the most important security concerns in database outsourcing. However, to the best of our knowledge, existing approaches require modification to existing DBMSs, which greatly hampers the adoption of database outsourcing. In this work, we focus on the design and implementation of an efficient and practical scheme based on Merkle B-tree, which provides integrity assurance including correctness, completeness and freshness without requiring any modification to existing DBMSs. We design a novel scheme to serialize a Merkle B-tree (MBT) into a database while enabling highly efficient authentication data retrieval for integrity verification, which makes it attractive and practical. We create appropriate indexes and design efficient algorithms to accelerate query processing with integrity protection. We build a proof-of-concept prototype and conduct extensive experiments to evaluate the performance overhead. The results show that our scheme imposes a low overhead for queries and a reasonable overhead for updates while ensuring integrity of an outsourced database without DBMS modification.
Keywords/Search Tags:Data, Integrity, Processing, Over, Cloud, Practical, Large, Bigtable
Related items