Font Size: a A A

Supporting I/O for remote visualization of high-performance scientific simulations

Posted on:2004-01-24Degree:Ph.DType:Thesis
University:University of Illinois at Urbana-ChampaignCandidate:Lee, JonghyunFull Text:PDF
GTID:2458390011953261Subject:Computer Science
Abstract/Summary:
Scientific data generated by large-scale parallel simulations are usually visualized or post-processed on a different, often remote, platform. This distributed setup may impose three I/O performance challenges. First, simulation codes periodically write copious data to disk. Second, these data need to be migrated to the remote platform, and the network throughput is typically much lower than that of a parallel file system. Third, typical post-simulation activities involving the migrated data are read-intensive, and can be slowed down by load imbalance when these tools run with heterogeneous disks, which are common in modern clusters. Without intelligent approaches to address these challenges, I/O can be a serious performance bottleneck.; This thesis presents techniques to address these I/O performance issues. First, for efficient data migration, we propose an architecture that integrates a parallel I/O library and a migration engine. We examine the use of data compression and a novel buffering scheme with this integrated architecture, to reduce application turnaround time. We also introduce performance models for several I/O and migration methods, and show how these models can be used to control the usage of I/O and migration resources. Second, we study data declustering across heterogeneous disks. Declustering distributes data over multiple disks, enabling efficient execution of visualization queries that retrieve only the areas of interest in each data set. We show how to use virtual servers to enable easy adaptation of existing declustering approaches to a heterogeneous disk environment, and propose methods and algorithms to decide the number of virtual servers and the mapping of virtual servers to disks.; We present the results of experiments with the Panda parallel I/O library that show that our proposed approach to data migration can reduce application turnaround time significantly. We also show that our declustering approaches can reduce the retrieval time for visualization queries on heterogeneous disks, while lessening performance variance across different queries that retrieve the same amount of data.
Keywords/Search Tags:I/O, Data, Performance, Visualization, Remote, Heterogeneous disks, Parallel
Related items