Font Size: a A A

Research On Interface Semantic Extension For Object-based Parallel File System

Posted on:2012-11-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:X D TuFull Text:PDF
GTID:1118330335455063Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, recent advances in storage sys-tem technologies and high performance interconnects have made possible in the last years to build, more and more potent storage system that server thousands of nodes. However, software parallelism that can be more effectively exploited by the current hardware is the key issue for the emerging bottlenecks and system scalability due to the enhanced storage requirement. Currently, the majority of storage systems of clusters are managed by kinds of scalable parallel file systems such as, for example, GPFS, PVFS, Ceph, Lustre and PanFS etc. And those storage solutions have mostly been adopted by the list of World's Top 500 Supercomputers. So this dissertation mainly focuses on the architecture and realization methodology of a highly efficient parallel file system. As the aim of supporting high perfor-mance computing(HPC) well, next steps research includes interface semantics extension for the optimized performance to meet I/O requirement of HPC, then layout-aware approaches for optimizing parallel jobs'I/O pattern to adapt with data intensive scalable computing, cases study on the coupled problem on parallel file system and computing framework, and issues on redundancy coding in paralle file system etc.The design and implementation of a massive object-based parallel file system, named CapFS, bring several characters towards the proposed prototype. It has customized data distribution strategies, remote direct data access capability based upon object-based stor-age(OSD) protocol and power of persistent data management in transaction. In detail, the proposed nested-RAID scheme, as the uniformed model and algorithm of data layout, pro-vides a way to enable client-driven layout computation and maintains a consistent notion of a filc's layout that provides POSIX semantics without restricting concurrent access to the file. Given the flat namespace service and scalable attribute management in OSD profile, a kind of mini database manager in kernel combined with local file system was proposed to take care of highly efficient object-based access and persistence management, and be fit for the differential service between objects in variable size and well-structured attributes. The machanism of OSD over RPC offers clients direct object-based storage access towards the available and shared-everything OSDs, also supports protocol negotiation among multiple storage transfer semantics mismatch. The tunnable parameters and testing results of the whole system both verified the effect and good scalability.So much evidence and analysis that the traditional POSIX interface can not afford to support HPC parallel applications whose I/O access pattern often consist of acesses to a large number of small, non-contiguous pieces of data. Those parallel applications lead to interleaved file access patterns with high interprocesses spatial lacality at the I/O nodes and high metadata throughput. Extensions are needed so that high-concurrency and high-performance computing applications running on top of the rapid prototyping parallel file system could perform well. So four types of interface extensions were presented to make storage I/O semantics match the upper applications. There arc shared file descriptor for con-current I/O, non-contiguous I/O oriented optimization, lazy and bulk metadata operations and layout control based on keeping POSIX semantics. Those subset of POSIX I/O inter-faces were deployed on the clusterd and high-speed interconnected file system. In addition, experimental results on a set of micro-benchmarks confirm that the extentions to the popular interface greatly improve scalability and performance than traditional methods.From bottom-up perspective, and takes the popular parallel computing framework as example. It can be easily found that the drawback of serious mediate data copy and commu-nication cost are caused by the semantics mismatch between exsiting I/O model and parallel computing framework. Compared with the difference between traditional distributed file system, the proposed layout parameterized by I/O-aware information helps to implement MapReduce computing framework over CapFS. I/O benchmarks and real application test-ing demenstrates such kind of parallel computation could execute upon the above parallel file system, in which the parallel I/O with several optimized and locality-aware functional-ities could be more feasible and flexible to the requirement of shipping code to data than Hadoop distributed file system. Among the three kinds of applications including computa-tion intensive, I/O intensive and both intensive, the proposed scheme could improve much more speed-up ratio for I/O intensive applications.For another, towards top-down perspective, a kind of erasure code was implemented by the parallel computing framework to provide better reliability and availability. This solution enables asynchronous compression of initially triplicated data down to RAID-class redun-dancy overheads, and those algorithms implementation based on MapReduce framework. Based on the algorithm, CapFS has implemented a redundant data management framework, which supports redundancy in different level including inter- and extra files, multiple user groups and devices level. Quite contrary to most exsiting solutions, in which the parity data is created in client side and transported in bind from clients to servers, vice the versa. The proposed redundancy method suggests an asynchronous way and totally transparent to clients'runtime, parity computation and loss recovery could be also recognized as parallel processing procedures. The experimental results come from metadata trace of Yahoo clus-ters, and demonstrated the efficiency of proposed algorithm and framework respectively.
Keywords/Search Tags:object-based storage, parallel file system, interface extension, parallel computing framework, asynchronous erasure code
PDF Full Text Request
Related items