Font Size: a A A

Key Technology Research Of Magnitude Data Preprocess Based On Hierarchical Data Format

Posted on:2006-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:G Y XieFull Text:PDF
GTID:2178360185963642Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Scientific Visualization is an efficient way of analyzing magnitude datasets. Those datasets produced from numerical simulation and remote-sensing, results in datasets produced are always large scale, high dimension and complex, and time varying. All of these bring on a challenge and desire to the traditional technology of pre-processing datasets.Hierarchical Data Format 5 library is the significant software of managing large size scientific datasets in the international world, and is becoming the main standard library. Based on deeply analyzing and research on multi-access to I/O mechanism of HDF5 library, as well as the mechanism of compressing and partition and parallel accessing, in this dissertation we study some crucial techniques, such as scientific compressed algorithm, parallel searching model and transform algorithm, and visualization parallel application algorithm, and implement those algorithms, contented the desire of visualization of extreme large scientific datasets. The main work of the dissertation is as follows:1. There are some problems such as low compression ratio, expensive compression and decompression time spent and inapplicable to massive scientific data while the traditional data compression algorithm work on scientific data, so the thesis concentrates on a data compression algorithm Rice. Because the Rice can't eliminate the data redundancy efficiently, the thesis gives a new way of two dimension difference prediction. The new way can achieve a double order difference predict and "Zig-Zag" scan difference predict. It was proved to be more efficient and higher data compression ratio. It is 3.9-30.1 percent highly in compression ratio comparing with the old way.2. This dissertation present the Data Serial Transform (DST) of HDF5 library based on the Depth Priority Search model. The experimental result showed that the algorithm is high efficient when processing middle and small scale datasets (not excess the memory space). But the efficiency becomes a linearity decline as the increasing of the datasets. In order to solve the problem existing in the DST, we studied the transform algorithm of parallel data and present a Data Collective Parallel Transform (DCPT) algorithm, which could efficiently partition and process the large-scale datasets in parallel. The result of experiment showed that the DCPT algorithm is more efficient than the DST algorithm in processing the data since it would take only 26.3-84.3% time of the DST.3. The HDF5 data file has the character of multi-data object and data structure complexity. For the low efficiency while DCPT processing complex data structure, a kind of Data Independent Parallel Transform (DIPT) algorithm was raised. The DIPT can expand the parallel communication field, adopt a Set-Aside Process to monitor the variety of metadata in files and support each process deal with data object itself while processing data in parallel. The result of experiment indicated that , while processing the HDF5 files with data object quite a bit, the efficiency of DIPT is 33.3-66.7% higher than DCPT.Based on the case above, ParaView system was adopted to do the encapsulation of data format-transform algorithm module. Then this system analyzed the parallel rendering...
Keywords/Search Tags:Hierarchical Data Format, HDF5, Data Compression, Data Collective Parallel Transform, Data Independent Parallel Transform., ParaView
PDF Full Text Request
Related items