Font Size: a A A

Techniques for Optimizing Cost of Enterprise Data Management

Posted on:2016-07-09Degree:Ph.DType:Dissertation
University:University of MinnesotaCandidate:Mandagere, NagapramodFull Text:PDF
GTID:1479390017480880Subject:Computer Science
Abstract/Summary:
Rapid adoption of data driven decision making, Internet of Things (IoT), mass digitization of content have led to unprecedented changes in data storage requirements. From the typical paradigms of online, nearline and offline data, now the boundaries are increasingly blurred with more fuzzy interactions between real-time transactions and analytics workloads. Optimizing Total Cost of Ownership (TCO) of data storage infrastructure characterized by capital/acquisition costs and operational management costs necessitate a radical redesign of data storage infrastructure to cope with exponential data growth not just in terms of capacity, but also veracity and velocity of data. Towards this goal, we make the following key contributions,---a) Improving operational efficiency through application aided storage power management: Energy consumption of the storage solutions contributes significantly to the operational efficiency of data management. We propose a storage solution called GreenStor, centered on MAID, but with more scalable and efficient data movement to aid in energy conservation based on extent-based cache management. b) Improving operational efficiency of data protection through model based approaches: Operational inefficiencies and scalability issues in data protection systems mainly stem from the usage of static policy based management. Using a data driven approach we characterize these inefficiencies and propose a model based dynamic backup scheduling framework that attempts to address key scalability and performance limitations of current backup systems. c) Improving cost efficiencies of data protection using commodity Software Defined Storage (SDS): Continuous Data Protection (CDP) enables recoverability to any point in time (time travel) facilitated via journaling of every write made by a system to disk. We propose cCDP---a Cloud CDP framework that efficiently combines cloud object stores with edge caching to address requirements of low cost, high capacity, low latency and high storage throughput. Operational efficiency and usability of CDP is a function of how efficiently data can be restored in case of a failure. To address recovery requirements, we propose a novel method of organizing the layout of CDP logs on object storage to optimize temporal search and an object naming encoding scheme coupled with a Trie based queueing mechanism to optimize spatial search.
Keywords/Search Tags:Data, Management, Cost, Storage, Operational efficiency, CDP
Related items