Font Size: a A A

Pattern-Aware Data Reorganization In MPI-IO

Posted on:2013-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:J HeFull Text:PDF
GTID:2248330395485402Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In the past decades, high performance computer has made scientific computing, modeling and simulation in many domains at large scales faster and more efficiently, which is a great opportunity for finding the solutions of challenging problems. Lots of scientific applications are data-intensive. For example, colliders, space telescopes and nuclear simulations generate tons of data per second, which requires high performance I/O. However, when the CPU speed has been growing following the Moore’s Law, the speed of I/O has not been growing accordingly. The gap between CPU and I/O makes I/O become the bottleneck of the system, which is the so called "I/O Wall" problem, which needs to be immediately addressed.MPI-IO and parallel file systems are widely used as ways to mitigate the I/O bottleneck. They bridge the gap between CPU and I/O speed by improving the parallelism of data accessing. In parallel file systems, the sizes and contiguousness of the data requests are two of the most important factors when it comes to performance. However, application developers organize the data by their logical understandings of the data, which may bring to the parallel file systems lots of small non-contiguous I/O requests and seriously degrade the performance. In this article, we propose an approach to match data access patterns and features of parallel file systems by reorganizing the data and eventually improve the performance of I/O systems. The major work we have done is as follows:1. We propose pattern-aware data reorganization in MPI-IO in order to enhance the contiguousness and to reduce the number of requests of data accesses. The major advantage of reorganizing the data in MPI-IO is that the layers below MPI-IO can use the improved data access patterns and then further enhance the performance. In the proposed approach, we analyze the data access traces and reorganize the data based on the access patterns we are aware of. The future data accesses have better patterns which can use the file systems efficiently.2. Based on the strategies stated above, we design a pattern-aware data reorganization system. First, we build a remapping table from the analysis of data access traces. When the application is run again, the remapping table will be loaded to the memory. Upon the arrivals of new requests, our system checks if they match the patterns already built in the remapping table. If they do, the remapping layer in MPI-IO will convert the old requests to new request for the reorganized data.3. We implemented and tested pattern-aware data reorganization system in MPI-IO and PVFS2(Parallel Virtual File System2). In this article, we compare the I/O-signature-based remapping table with a traditional remapping table. The results show that our I/O signature-based table takes less time for searching and less space. In addition, we tested and confirmed that the pattern-aware file has good fault-tolerance capabilities. Moreover, we tested and analyzed the performance by IOR and MPI-TILE-IO. By placing data in a way favoring the parallel I/O system, gains of up to two orders of magnitudes in reading and up to one order of magnitude in writing were observed with spinning disks and solid-state disks.
Keywords/Search Tags:data reorganization, MPI-IO, PVFS2, parallel I/O, parallel computing
PDF Full Text Request
Related items