Font Size: a A A

Fine-grained Algorithm And Architecture For Data Processing In SAR Applications

Posted on:2011-09-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:J ZhouFull Text:PDF
GTID:1118330341951651Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Synthetic Aperture Radar (SAR) technology towards high-resolution, wide-range, multi-band, multi-polarization and multi-mode, the data volume and computation complexity are increasing significantly. Meanwhile, the Synthetic Aperture Radar processing systems are assembled on the platforms of airplane, satellite or missile. Miniaturization, light-duty and low-power consumption are the inevitable development trend. Therefore, it's of great theoretical significance and practical value to study the real-time processing systems on the on-board, space-borne or missile-borne adaptively processing platform, with large-capacity storage and power computation.Firstly, SAR Applications are divided into two categories: static target imagery and moving target detection and imagery. Then the critical data processing algorithms are summarized from the two kinds of applications. For each algorithm, the fine-grained parallel algorithm and architecture is advanced. And the detailed work are bellows.1) Aiming at designing and implementing of different scale FFT accelerator, this paper advances a parameterized template framework and methodology for fine-grained parallel algorithm and architecture. Then the performance model and hardware resource utilization model for fine-grained parallel FFT accelerators, which are designed by the template-based designing methodology, are presented. Finally, a template-based framework for FFT hardware automatic generation by adequately expanding is given, which would be the basis of our next research target.2) To resolve least squares problem of Space-Time Adaptive Processing (STAP), four matrix decomposition algorithms, including Givens Rotation, Householder transform, MGS and LU decomposition with pivoting, are detailedly analyzed. We propose a unified framework for the matrix decomposition algorithms, combining three QR decomposition algorithms and LU decomposition algorithm into a unified linear array structure. The QR and LU decomposition algorithms exhibit the same two-level loop structure and the same data dependency. Utilizing the similarities in loop structure and data dependency of matrix decomposition, we unify a fine-grained algorithm for all four matrix decomposition algorithms. Furthermore, we present a unified co-processor structure with a scalable linear array of processing elements (PEs), in which four types of PEs are same in the structure of memory channels and PE connections, and the only difference exists in the internal structure of data path. Our unified co-processor, which is IEEE 754 single-precision floating-point, is implemented and mapped onto a Xilinx Virtex5 FPGA chip. Experimental results show that our co-processors can achieve speedup of 2.3 to 14.9 comparing to a Pentium Dual CPU with double SSE threads.3) A fine-grained parallel algorithm and architecture for SAR imagery system is presented. To resolve the alternate DRAM accessing between row and column data of their matrix, this paper presents optimal memory access theory by analyzing the optimal window size to minimize the total number of opening/closing pages, and implements a window-based DRAM controller, which can effectively relieve the problem of memory wall. What's more, the optimal memory access theory can be used in other fields needing access the row and column data of their matrix alternately, such as matrix multiplication, image processing etc. Basing on the research work of FFT, this paper studies the fine-grained parallel algorithms and architectures for SAR imagery according to whether the on-chip hardware resource is sufficient or not. Comparing to the closely related proposal DM system, the speedups of our implementation performing 64*64 and 256*256 SAR image processing can reach 2.12 and 2.27 respectively. For large matrix, our proposal can expect higher performance gain due to the performance potential of our window memory layout scheme.4) Research on the fine-grained parallel algorithm and architecture for STAP, which has outstanding performance of clutter and jamming suppression. Two or three directions of the three-dimensional input data would be accessed during the STAP calculation. Similar to SAR imagery system, we propose 2-D and 3-D DRAM accessing modes to alleviate the bottle of memory accessing in STAP system. Basing the FFT and matrix decomposition researches, we present the fine-grained parallel algorithms and architectures for STAP. Furthermore, bank-cycle structure RAM and two-dimensional processing array are advanced for STAP co-processor. Comparing Pentium Dual CPU with double SSE threads, the STAP array co-processor with 16 processing elements can achieve speedup of 10.50 times.5) To calculate the floating-point transcendental functions of SAR application systems, we propose a hybrid-mode CORDIC algorithm, combining hybrid rotation angle methods with argument reduction algorithm to reduce hardware area usage and meanwhile keep unlimited convergence domain for any floating-point inputs of the functions. Then we implement single-precision floating-point CORDIC co-processor basing the proposed algorithm. Performance tests employing three scientific program kernels show that our CORDIC co-processor can achieve a maximum speedup of 47.6 times, 35.2 times in average comparing to Pentium4 CPU.
Keywords/Search Tags:Synthetic Aperture Radar (SAR), Fine-grained Parallelism, FPGA, Fast Fourier Transform (FFT), Matrix Decomposition, Space-Time Adaptive Processing (STAP), CORDIC
PDF Full Text Request
Related items