Backprojection radar imaging algorithm requires extremely high performance of a computing system. Based on the thorough analysis of the algorithm, a high performance computing system is designed, and several key technologies and optimizations are made to improve the performance.In the preprocessing part of the algorithm, massive processing of vectors with large and variable length, both the regular vector computations and FFT, are involved. SIMD processors have advantages in accelerating variety of regular vector computations. FFT, however, is different from these regular vector computations and is executed by dedicated FFT accelerators in most real-time systems. A SIMD processor with FFT acceleration instruction is designed, in which the FFT acceleration efficiency is as high as dedicated accelerators. The extra cost of employing a FFT accelerator is avoided as a result.A high-performance and low-cost Backprojection Engine is designed to accelerate the core backproject computation. Each Backprojection Engine conducts backproject computation for one pixel each clock cycle, which is pipelined. Based on the precision analysis, precision-critical arithmetic is implemented in fixed-point instead of double floating-point as in reference software. Logic hardware cost is reduced by nearly50%and on-chip memory cost is reduced by37.5%. Precision is also improved. The error of the most precision-critical value, phase, is reduced from11°to1.4°8Backprojection Engines are integrated into the Backprojection subsystem for parallel computing. Instead of the original pixel-level parallelism, we designed a pulse-level parallelism and corresponding parallel architecture. The requirement of both the DDR access bandwidth and the on-chip memory for pixel values is reduced by87.5%. Compared with a single Backprojection Engine, a speedup of more than7.99is achieved with the same on-chip memory for pixel value, the same main memory bandwidth and8times of logic hardware resources and on-chip memory for pulse value.In order to save the simulation time of Backprojection algorithm, we use GPU to accelerate the algorithm. Based on the analysis of both the algorithm and the target platform, pixel-level parallelism is selected and a parallelized vision of the algorithm is implemented with CUDA. The simulation time of the algorithm is saved from5h and23min to3min and20sec, with a speedup of97times. |