Font Size: a A A

Analysis And Optimization Of Data I/O Pass In The Distributed File System

Posted on:2014-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:H F SunFull Text:PDF
GTID:2268330422963461Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In the big data era, storage systems play more and more important role in manypractical applications. The efficiency of storage system directly determines theperformance of the application running on the system. Now, most file systems fit theincreasing need of performance by enhancing their scales, which also leads to someproblems, such as more and more costs and the difficulty of safeguard. The object-basedfile systems delegate responsibility to object server, but they ignore one fact that all nodesin a system are an organic whole. The key factor of a successful storage system lies inmaking full use of every nodes and interconnection network in the system.Storage systems play more and more important role in many practical applications.The efficiency of storage system directly determines the performance of the applicationrunning by the system. The key factor of a successful storage system lies in making fulluse of every nodes and interconnection network in the system, which is also the key factorof improving the system’s scalability and availability. All successful systems can make fulluse of all their components.In this thesis, we focus on the process of data I/O and optimize the key factor. Thedesign and implementation base on Cappella which is an object-based distributed filesystem developed in our lab.In the process of writing data, we develop a dynamic scheme of data layout whichbased on real time workloads of the data sever. Every server has a weight standing for itsbusiness degree which is collected every3seconds. Before storing data, it is the weightthat determines which one is selected. The dynamic data layout scheme successfullysettles the problem of static one which is adopted in Cappella.In the process of reading data, we make an analysis to the linux kernel’s prefetchingstrategy and propose a method fitting in distributed environment to overcoming linuxkernel’s drawback. Linux kernel’s prefetching strategy is put forward on local file systemand disk as storage device, which has shortages in distributed environment. In largedistributed file systems, files’ data are stored in special devices or servers, which areintegrated by high speed interconnection network. Therefore, in this thesis we propose a new prefetching method for distributed environment, which fully considers the influenceof interconnection network and data layout.Performance measurements on Cappella show that file data are equally scattered onall storage servers by each weight and the new prefetching method has excellentperformance by30%at least,90%at most, in the environment of sequential access or bigblock random access.
Keywords/Search Tags:distributed file system, data layout, workload balance, prefetching
PDF Full Text Request
Related items