Font Size: a A A

Parallel Optimization Study Of SKA Low Frequency Continuous Spectrum Imaging Pipeline

Posted on:2024-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y J WeiFull Text:PDF
GTID:2530307157982019Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:
The Square Kilometre Array(SKA)radio telescope project is an international big science project of which China is a core member,dedicated to solving major frontier scientific questions such as the origin of the universe.The Continuous Spectral Imaging Pipeline technique is one of the main methods to process the continuous spectral survey observations of the SKA.The Ga Lactic and Extragalactic All-sky Murchison Widefield Array survey extended(GLEAM-X)project has produced 2 PB of low-frequency observations.The existing continuous spectral imaging pipeline is based on serial mode for data processing,which has low operational efficiency and cannot meet the demand for batch processing of petabyte-level observations.Therefore,this paper conducts a parallel optimization study of the imaging pipeline based on the GLEAM-X survey data to achieve an efficient and highly scalable data processing pipeline.The main research elements of this paper are as follows:(1)In order to provide an experimental environment for the pipeline parallel optimization study and to provide a case reference for future migration deployment of the imaging pipeline on other SKA regional centers,the localized deployment of the GLEAMX imaging pipeline was studied and realized on the prototype of the Chinese SKA regional center,and the correctness of the imaging pipeline was verified and performance tests were completed.The experimental results show that the serial imaging pipeline has the problems of slow processing speed and low efficiency in large-scale data processing.(2)To address the shortcomings of the serial imaging pipeline,a parallel optimized processing method of the imaging pipeline based on the(Simple Linux Utility for Resource Management,SLURM)job scheduling system was studied and implemented.The method uses the task scheduling function of SLURM to realize the multi-node parallel processing of the pipeline by distributing the pipeline processing of different data evenly to multiple computing nodes for parallel execution with a load-balanced scheduling strategy by cyclic parallelization.The experimental results show that the method improves the speed and efficiency of batch data processing compared with serial imaging pipeline,but still suffers from the problems of low scalability and the need for manual resource scheduling.(3)To address the problems of(2),a parallel optimized processing method based on the(Message Passing Interface,MPI)programming model for pipelines is studied and implemented.The method adopts the MPI peer-to-peer model and divides the pipeline processing of different data into several parallel processes with the same status and similar execution functions through the process management of MPI,and automatically allocates them to multiple computing nodes for parallel execution in a load-balanced manner to realize the multi-node parallel processing of imaging pipelines and automated resource scheduling.The experimental results show that this method has better parallel efficiency and scalability than the SLURM-based imaging pipeline,but the stability and imaging pipeline ease of use are poor.(4)Finally,to address the problems of(2)and(3)methods,a parallel optimization processing method for imaging pipelines based on the Distributed Execution Framework(Data Activated Liu Graph Engine,DALiu GE)is studied and implemented.The method is to distribute the pipeline processing of different data to multiple computing nodes for parallel execution in a real-time scheduling manner,which realizes the multi-node parallel processing of the pipeline.The experimental results show that the imaging pipeline parallel optimized processing method based on the DALiu GE distributed execution framework has better processing performance advantages,higher scalability and better pipeline ease of use,providing a feasible solution for SKA massive data processing.
Keywords/Search Tags:Square Kilometer Array, China SKA Regional Center, Continuum Imaging Pipeline, Parallel Computing, DALiuGE
Related items