Font Size: a A A

The Research For Key Technology Of Astronomy Big Data Integration Based On Spark

Posted on:2019-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z C TianFull Text:PDF
GTID:2370330572968151Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,with the development of science and technology,data acquisition capacity of astronomical observation equipment has greatly increased and the quantity of multi-band astronomical data increase exponentially.Astronomy has gradually entered the era of "big data" in full-band sky survey.With such large amount of data,it is difficult to use serial or traditional parallel processing methods.If there are no more efficient methods to solve the efficiency problem,it is difficult to explore the implied information from the data.However,thanks for the rise of distributed computing,distributed computing framework(represented by Hadoop and Spark)has changed the form of parallel computing,which has become the next generation big data analysis and calculation.In this paper,we will discuss a series of problems and related technologies between Spark and astronomical data integration which based on the previous work.Combining with the characteristics of astronomical data,we will dig into the Spark of parallel computing optimization mechanism and take advantage of its superiority.To solve the efficiency problem of big data in the astronomical survey,we did research from the following two aspects:Firstly,the archiving of astronomical data.We proposed an efficient distributed generation algorithm based on the HEALPix index and Spark.The algorithm was introduced by the idea of the hierarchical index and used HEALPix to archive the large-scale astronomical data with hierarchically,clumpy and continuously.It can improve the calculation and access efficiency of cross-matching,leakage source monitoring and other astronomical calculations.Secondly,the integration of astronomical data.In order to solve the limitation of most online cross-matching tools that cannot deal with the matching calculation with large-scale data,we proposed a large scale astronomical data cross-matching algorithm based on Spark.By analyzing the principle of cross-matching,we used HEALPix to solve the problem of distance matching in cross-matching.We also proposed some optimization with the characteristics of Spark which can improve the efficiency of cross-matching calculation in large-scale data.We have proved the availability of two methods above through the experiment,all processing and analysis can be finished in a short time with large-scale astronomical data.The research results of this paper can provide a comprehensive technical reference for astronomical research in the big astronomical data environment.
Keywords/Search Tags:Distributed computing, Spark, Big Data, Astronomy, Cross-Matching, Footprint
PDF Full Text Request
Related items