Cluster-based storage systems with high scalability

Posted on:2006-06-08

Degree:Ph.D

Type:Dissertation

University:The University of Nebraska - Lincoln

Candidate:Zhu, Yifeng

Full Text:PDF

GTID:1458390005996394

Subject:Computer Science

Abstract/Summary:

In recent years, high-end computing has undergone two significant changes: (1) an increasing focus on data-intensive applications, such as data mining, computational biology, and high energy physics, and (2) a paradigm shift from tightly coupled high-end proprietary computing systems to a loosely coupled cost-effective platform that consists of networked commodity machines, also known as clusters. Thus a reliable and scalable storage infrastructure in clusters becomes increasingly crucial for high-end computing. This dissertation investigates the effectiveness of utilizing the existing disks to build a cluster-based storage system and addresses the key problems that limit the scalability of such cluster-based storage systems from four different levels: the block data level, the metadata level, the file data level, and the application level.; At the block data level, this dissertation proposes a novel and simple replacement scheme, called RACE, which differentiates the locality of I/O streams by actively detecting access patterns inherently exhibited in two correlated spaces: the discrete block space of program contexts from which I/O requests are issued and the continuous block space within files to which I/O requests are addressed. RACE is shown to significantly outperform LRU and all other state-of-the-art cache management schemes studied in this dissertation, in terms of hit ratios. At the metadata level, this dissertation exploits the temporal locality of metadata accesses to improve metadata access performance by designing a Hierarchical Bloom filter Array (HBA) scheme that decentralizes the metadata management. Our implementation indicates that HBA with 16 metadata servers can reduce the metadata operation time of a single-metadata-server architecture by a factor up to 43.9. A theoretical model that incorporates the staleness to estimate false rates of Bloom filters is proposed to support adaptive Bloom filter updating. At the file data level, this dissertation proposes to utilize redundant data to optimize the performance for large data accesses by dynamically scheduling I/O requests among data servers to improve I/O performance. At the application level, this work conducts a case study for a popular I/O intensive application, parallel BLAST, and uses this application as a benchmark to evaluate the techniques proposed at the file data level.

Keywords/Search Tags:

Data, Cluster-based storage, Application, I/O requests, Systems

Related items

1	Study Of Store-and-Forward In Optical Circuit Switching Network
2	Research On Scalable High Performance Web Server Systems
3	Research On A High Reliable And Scalable Cluster-based Storage System
4	Research Of Data Synchronization On Storage Technology Based On Cloud Storage And P2P
5	Research On Behavior Of Throughput Collapse In Cluster Based Storage Network
6	Staged database systems
7	Research And Design Of Iscsi-based Storage Cluster
8	Design And Implementation Of A Acquisition, Processing And Storage Data System Oriented To Cluster Monitoring
9	Research And Implementation Of Cluster Storage System Based On Linux Environment
10	The Research Of The Key Technology In The Cluster NAS Storage System CLNASFS