Font Size: a A A

File system virtualization and service for grid data management

Posted on:2009-07-29Degree:Ph.DType:Dissertation
University:University of FloridaCandidate:Zhao, MingFull Text:PDF
GTID:1448390002999912Subject:Computer Science
Abstract/Summary:
Large-scale distributed computing systems, such as computational grids, aggregate computing and storage resources from multiple organizations to foster collaborations and facilitate problem solving through shared access to large volumes of data and high-performance machines. Data management in these systems is particularly challenging because of the heterogeneity, dynamism, size, and distribution of such grid-style environments. This dissertation address these challenges with a two-level data management system, in which file system virtualization provides application-tailored grid-wide data access, and service-based middleware enables autonomic management of the data provisioning.;The diversity of applications and resources requires a data provisioning solution that can be transparently deployed, whereas the dynamic, wide-area environments necessitate tailored optimizations for data access. To achieve these goals, this dissertation proposes grid-wide virtual file systems (GVFS), a novel approach that virtualizes existing kernel distributed file systems (NFS) with user-level proxies, and provides transparent cross-domain data access to applications. User-level enhancements designed for grid-style environments are provided upon the virtualization layer in GVFS, including: customizable disk caching and multithreading for high-performance data access, efficient consistency protocols for application-desired data coherence, strong and grid-compatible security for secure grid-wide data access, and reliability protocols supporting application-transparent failure detection and recovery. Based on GVFS, data sessions can be created on demand on a per-application basis, where each session can apply and configure these enhancements independently.;The second level of the proposed data management system addresses the problems of managing data provisioning in a large, dynamic system: how to control the data access for many applications based on their needs, and how to optimize it automatically according to high-level objectives. It proposes service-based middleware to manage the life-cycles and configurations of dynamic GVFS sessions. These data management services are able to exploit application knowledge to flexibly customize data sessions, and support interoperability with other middleware based on Web Service Resource Framework. In order to further reduce the complexity of managing data sessions and adapt them promptly to changing environments, an autonomic data management system is built by evolving these services into self-managing elements. Autonomic functions are integrated into the services to provide goal-driven automatic control of GVFS sessions on the aspects including cache configuration, data replication, and session redirection.;A prototype of the proposed system is evaluated with a series of experiments based on file system benchmarks and typical grid applications. The results demonstrate that GVFS can transparently enable on-demand grid-wide data access with application-tailored enhancements; the proposed enhancements can achieve strong cache consistency, security, and reliability, as well as substantially outperform traditional DFS approaches (NFS) in wide-area networks; the autonomic services support flexible and dynamic management of GVFS sessions, and can also automatically optimize them on performance and reliability in the presence of changing resource availability.
Keywords/Search Tags:Data, System, GVFS, Virtualization
Related items