Font Size: a A A

Research On Massive Astronomical Data Oriented Distributed Storage Engine

Posted on:2015-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:J YuFull Text:PDF
GTID:2298330452459588Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the astronomical observation, the astronomical data isincreasing rapidly which makes the storage and search of the data become a bigproblem. The performance of the traditional single-node file systems and relationaldatabases in handing massive astronomical data is unacceptable. Therefore it iscrucial and necessary to develop a distributed storage engines for massiveastronomical data.Astronomical data can be divided into star category and star image, both of whichhave different data formats and application scenarios. This paper designs twodistributed storage solutions based on the features of the star category and star image,thereby accelerating the access of these two data. Distributed file system is used toaccess the star image. Based on the features of the star image, this paper first proposethe hybrid data access model to reduce the distributed file system, and then implementthis model on the open source distributed file system OrangeFS. FastBit which is acolumn storage engine based on the bitmap index is used for the star category. Thispaper designs and implements a distributed data storage engine FastBit, proposing theFastBit distributed data partitioning algorithm and SQL-based analysis of parallelquery algorithm to solve FastBit’s memory problem when dealing with massiveastronomical data.This paper is divided into two parts; the first part analyzes the throughputbottlenecks of client nodes for distributed file system when dealing with different filesizes and different scale of data, and proposes the hybrid data access model. Then itdescribes the principles and processes of the model and how the model is applied tothe distributed file system OrangeFS. Benchmark and astronomical applications areused to test and verify the acceleration of access the star image when using the hybriddata access model.The second part first analyzes the problem of traditional rational database systemwhen dealing with massive star category and introduces the bitmap index basedcolumn storage engine with its memory problem. Then it proposes the distributed datapartition algorithm, SQL analysis based parallel search algorithm and the architecture of the distributed FastBit. In the end, it uses the star category to do the experimentalverification and analyze the experimental results.
Keywords/Search Tags:Massive astronomical data, distributed storage, bitmap index, OrangeFS, FastBit, data layout
PDF Full Text Request
Related items