Font Size: a A A

Research On High-Efficient Massive Data Oriented Astronomical Cross-Match

Posted on:2011-10-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q ZhaoFull Text:PDF
GTID:1118330338983308Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Astronomical cross-match is the kernel technology to realize multi-band data aggregation. After this operation, the multi-band or full-band data contains more information to reveal celestial objects'physical essence. Therefore it is a key step to deenpen astronomers'understanding of celestial objects and accelerate new scientific discoveries. Because the astronomical data sets are usually very large, this problem must be resolved by computer technologies such as parallel computing, distributed computing and massive data processing technologies. In this thesis, based on previous research, both efficient parallel cross-match function and distributed cross-match function have been designed and implemented separately for multi-core environment and large-scale cluster enironment, and some breakthroughs have been made for its main performance bottleneck– too frequent and time-consuming data I/O operations. As a result, large-scale cross-match computing on massive data sets becomes reality.Firstly, an parallel cross-match function in multi-core environment has been designed. By adopting HEALPix, which is a pseudo two dimensional spherical index method, not only the speed of data querying has been increased, but also the time complexity of cross-match computing has been reduced by implementing data regional partition. And then for the common and classic problem in cross-match– the border source-leakage problem, a solution has been issued which can guarantee the results'integrality. Experiments show that this method has a great contribution to the efficiency improvement of cross-match. After that, a new data loading and computing flow model named boundary growing model, and a basic task assignment and scheduling unit named biggest growing block are proposed. They not only reduce the data re-reading frequency, but also implement the data filtration of space areas. The experiment results show that it can furtherly improve the efficiency of cross-match by about 50 percent. In addition, the thesis also validate the feasibility of these cross-match methods under the HTM index function through both theoretical and experimental analysis. Therefore, it can be believed that these methods have broken the dependency on unitary index function comparing with traditional functions.In order to break through the performance limitation of relational database when processing magnanimity data, as well as to satisfy the storage requirements for the massive astronomical observation data, the paper furtherly presents a new cross-match function based on MapReduce distributed computing model and its corresponding distributed file system. According to the distingrish feathers of MapReduce, by re-arranging the data distribution among computing nodes, the inter-node communication has been reduced as much as possible, as a result, a near-linear speedup has been achieved. The experimental results show that this method outperforms the above-metioned parallel cross-math methods based on relational database in multi-core platform greatly. It makes a foundation for the implementation of real-time online cross-match service in future.On the other hand, the quick bit-operational algorithms issued in this thesis which are used to calculate the index numbers of the neighbor blocks are not only a basic guarantee for realizing high-efficiency cross-match, but also play an important role on multiple kinds of astronomical data processing applications such as cone search. The efficient cross-match approaches issued in this thesis using parallel computing technologies, distributed computing technologies, and massive data processing technologies, have high reference values for resolving other large-scale astronomical data processing problems in future.
Keywords/Search Tags:Astronomical Cross-Match, Boundary Growing Model, MapReduce, HTM, HEALPix
PDF Full Text Request
Related items