Font Size: a A A

Parallel Acceleration Research And Application Of Pulsar Search Data Processing

Posted on:2022-10-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:S P YouFull Text:PDF
GTID:1480306731468564Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years,as the scale of data has been increasing,the demand for computing is also increasing.At the same time,integrating heterogeneous systems of traditional general-purpose processors(CPUs)and new types of co-processors(GPUs,FPGAs,Xeon Phi,etc.)has become a way to solve all kinds of computational problems,gradually gaining public recognition,and moving from academia to industry.Conventional generalpurpose processors can handle complex control tasks and provide general-purpose computing.The new co-processors,with their large number of cores and high performance,can provide acceleration for computing.However,due to the differences in system architecture,instruction set and programming model of heterogeneous systems and traditional computing,application development and debugging based on heterogeneous systems become more and more complicated,especially the emergence of many-core GPUs.There are hundreds of computing cores within the system,which imposes higher requirements on the parallelization of application algorithms based on traditional general-purpose processors.Therefore,it is very necessary to study the CPU/GPU parallel algorithm and its key optimization techniques.By increasing the parallelism of the algorithm and optimizing the storage structure of the algorithm,it fully exploits the computational potential of heterogeneous systems and makes use of the performance of many computing cores to accelerate applications.In this dissertation,the CPU/GPU parallel algorithm and its key optimization techniques are studied.In view of the different problems that occur in different processing steps in the pulsar search data processing flow,parallel acceleration is performed in a heterogeneous system environment,and the research results are applied to specific projects.In the process,a good acceleration effect was achieved.The work and innovation of this dissertation mainly include:(1)Researched the acceleration of pulsar search pipeline which based on the famous pulsar search tool PRESTO.For the incoherent de-dispersion step,which consumes about 90% of running time,was accelerated by GPU,which effectively improves the dedispersion performance.Compared with PC single-threaded execution,the test results showed that it has more than 10 times speedup ratio.the parallel de-dispersion algorithm we proposed supports multiple GPUs,the computational acceleration on 6 GPUs is nearly 60 times faster than PC processor single-threaded execution,and more than 200 times faster than server single-threaded execution.The processing time using GPU(816seconds)is about the same as the time of PC single thread processing 256 channels(812seconds),while the SNR of 2048 channels calculated on GPU is 149.4,compared with The PC single-threaded processing of 256 channels has a signal-to-noise ratio of 53.4,which has a nearly 3 times improvement of the signal-to-noise ratio,thus more conducive to the subsequent identification of pulsar candidates.For the steps of real-time Fast Fourier transform,acceleration search,and folding in the FAST data pulsar search pipeline,we adopt a combination of parallel task queue and threading pools based on a CPU.The test results on a PC with 8-thread cores show that it has more than 4 times the speedup ratio compared to a single-threaded PC.We tested a simulated file of parameters of FAST pulsar survey on a PC and a server.Compared with PC single-threaded execution,the search has more than 10 times speedup ratio;compared to server single-threaded execution,the search has more than100 times speedup ratio.(2)Researched the single-pulse search and identification process,reprocessed the FAST ultra-wideband data,and discovered 7 pulsar candidates,2 of which have been confirmed and announced.Total of 112 pulsars were searched form single-pulse,of which45 pulsars were newly discovered by FAST.A parallelized single-pulse search and identification process is designed.The de-dispersion is calculated on GPU due to the highest time consumption.Other three steps of single-pulse search,single-pulse candidate screening,and data folding were implemented on CPU.This search pipeline reprocessed the drifting scan data of the ultra-wideband receiver during the FAST commissioning process from August 2017 to May 2018.Single-pulse search and identification used the multiprocessing module in python to achieve parallelization.The single-pulse candidate files generated by the single pulse search are grouped and identified.It is found that by using more density de-dispersion grid can effectively improve the identification effects of single-pulse.After identification,the FFA was used for periodic search,and the period of the maximum signal-to-noise ratio found in the periodic search was selected for data folding to obtain pulsar candidates.Used pulsar classification and diagnostic tools to check the dispersion vs signal-to-noise ratio distribution,so as to determine whether the single pulse comes from radio radiation from the universe.Established the FAST ultra-wideband data single-pulse search database,and the searched single-pulse groups and other information are managed by the database,The single-pulse database recorded a total of 2,244,298 single-pulse group candidates.We then compared the FAST single-pulse database with the Parkes single-pulse search database.For the coincident sky,FAST matched a total of 21 pulsars,include 4 pulsars newly discovered by FAST.After DM value comparison,we found that 8 known pulsars was detected both by FAST and Parkes.(3)Researched the parallel strategy of the fast folding algorithm was researched,and a GPU-based parallel fast Folding algorithm,we takes thread block as the unit.The algorithm is divided into two parts: intra-block parallel and inter-block parallel.Analyzed the factors that affect the performance of the GPU parallel fast folding algorithm.For the time series data with double-precision floating point format during the calculation process,the computation intensity is less than 1/24.It is found that the main factor affecting the performance of the fast folding algorithm is memory access bandwidth issues.Several basic strategies for increasing memory access bandwidth issues were discussed,but due to GPU hardware limitations(global memory,shared memory size,etc.),the scalability of parallelism was limited.Based on the first pulse evaluation algorithm in the fast folding algorithm period search tool ffancy,a parallel pulse evaluation algorithm based on absolute median difference for data normalization and filtering is given.The GPU-based parallel fast folding algorithm and parallel pulse evaluation algorithm were tested experimentally,It was performed on a PC with a 4-core 8-thread CPU and a GPU.After testing a simulated file,it was found that it takes 31.12 seconds of the test without down-sampling on CPU and the parallel GPU implementation takes 8.02 seconds to search for periods,and the speedup ratio is about 3.88.It can be seen through experiments that for the current parallel search algorithm,by consuming a little more time than serial execution on the CPU,it can run on the GPU to get a better periodic search effect than the CPU.This work made use of the data from FAST(Five-hundred-meter Aperture Spherical radio Telescope).FAST is a Chinese national mega-science facility,operated by National Astronomical Observatories,Chinese Academy of Sciences.Our research results have been applied in the pulsar search data processing process of FAST's early scientific data center,providing certain technical support for FAST's ongoing 19-beam pulsar survey and observation data processing.
Keywords/Search Tags:Parallel computing, GPU, De-dispersion algorithm, Single-pulse search, Fast folding algorithm
PDF Full Text Request
Related items