Representative parallel applications always need to access a large mount of I/O data. And these sequences of I/O accesses often consist of a lot of fine-grained I/O accesses. Consequently, the parallel file systems have to make trifle I/O access in high frequency, so it will take the parallel file systems much time to start an I/O action but just transfer hundreds of bytes. Therefore, parallel applications just can get a very low I/O bandwidth relative to the maximum I/O bandwidth the parallel file systems can supply.An important resolution is keeping information about the I/O access pattern while the I/O requests are being passed from the application to network driver, and make use of the information to enhance I/O performance. The common implementation is just adding an I/O library between the applications and the file system to acquire the access pattern of parallel applications. The usual optimizing approach in I/O library is to combine the fine-grained I/O accesses into a few coarse-grained I/O accesses so as to make full use of the I/O performance the parallel file system can supply, however it will lead to much more I/O expenses.If the underlying parallel file systems can supply some high-efficiency I/O interface to access the fine-grained file data directly, we can simplify the implementation of the I/O library and boost the I/O performance.This paper describes the I/O structure of the Lustre file system and implementation, and studies the parallel I/O techniques in Lustre file system, implements a fine-grained "Direct I/O" access mode and interfaces, and implements an enhanced parallel I/O mode based on these interfaces in MPI-IO library.Then we did a series of I/O bandwidth benchmark test on the enhanced parallel I/O mode in the Lustre file system. The results demonstrate that the new I/O mode based on the new interfaces out-performs the other I/O mode based the old interfaces, especially when accessing structured data set. |