Font Size: a A A

Improving PVFS For Large Scale Dataset Processing

Posted on:2017-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhangFull Text:PDF
GTID:2308330485968084Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The file system is the key point in high performance and distributed computing area, for the file system and the storage condition directly affects the performance of the applications. With the development of high performance and distributed computing, parallel file system has been paid more and more attention. PVFS(Parallel Virtual File System), as a typical parallel file system, is widely used in many research fields such as physics, astronomy, geology, and some key points in the PVFS were studied in these years.However, PVFS does not support the dynamic extension and data migration, it can’t meet the requirements of practical application environment. Meanwhile, PVFS also does not support the caching mechanism and corresponding cache management algorithm, which also limits the performance of PVFS. This article did a lot research work in this two aspects, and has made the following main results:(1) Put forward and implement a dynamic extension algorithm, improve the extension of PVFS. Based on the detailed analysis and research of PVFS source code and the software construction, we make the PVFS supports the dynamic extension. After the extension of cluster nodes, we also proposes a data migration method based on PVFS reference to load balancing strategy. This method takes the user demand as the main purpose, meanwhile without affect the performance of the system and speed up the I/O performance.(2) Design and implement a caching module and prefetching algorithm based on PVFS, improve the I/O performance of PVFS. Each I/O operation must access the hard disk in PVFS I/O operations. The hard disk will be a bottleneck of the PVFS, especially in the case of multiple applications and multiple users. In order to solve this problem, we designed and implemented an independent caching module. At the same time, we use recommendation algorithm for cache block prefetching and replace work with the GPU acceleration.(3) Based on the two above points, this paper design and implement a prototype system based on PVFS with the advantage of PVFS, supports the dynamic extension and caching management at the same time. We apply the typical geological algorithm: Reverse-time migration algorithm into our file system to verify the effectiveness of the algorithm in this paper.
Keywords/Search Tags:Parallel file system, PVFS, dynamic extensions, data migration, cache management, prefetching algorithm
PDF Full Text Request
Related items