Font Size: a A A

Research On High Efficiency Heterogeneous Parallel Computing Based On CPU+GPU In Image Matching

Posted on:2012-06-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:H XiaoFull Text:PDF
GTID:1118330344452153Subject:Photogrammetry and Remote Sensing
Abstract/Summary:PDF Full Text Request
The rapid upgrade of multi-core CPU and Graphics Processing Unit (GPU) not only brings along the advance of the related applied technology such as image process, virtual reality, and computer simulation, but also provides an operating platform for low power consumption general-purpose computing of good price/performance ratio except for graphics process. Therefore, general-purpose computing based on GPU has become a very hot research topic in the field of high-performance computing.With the continuous development of sensor technology, resulting in the means for people to obtain the surface information more and more diverse quickly. The face of diverse data sources and doubling data quantity, many conventional algorithms could not well meet the challenge of the high-speed computing of large-scale data. The increasing programmability and high performance computational power of GPU present in modern graphics hardware provides great scope for acceleration of photogrammetry and remote sensing algorithms which can be parallelized. This dissertation gives a detailed analysis research on massively parallel computing based on GPU in issues of image matching, and also proposes effective solutions. Specific tasks are outlined below.(1) Based on the heterogeneous manycore architecture consisted of CPU and GPU schemes for image processing in common is given by studying the field of photogrammetry and remote sensing image matching processing associated with the four algorithms in parallel processing on the GPU. GPU-based massively parallel computing design patterns is explored in image processing. General parallel schemes based on GPU in image processing need to be pre-evaluated in terms of data accuracy, latency, and computing quantity etc. In addition, in the algorithms design and optimization, parallel computing methods such as function and data partition and thread mapping etc, optimization strategies such as memory access optimization, communication optimization and dictation optimization, should be adopted. In the design and optimization of general schemes based on GPU in image processing, various factors should be taken into consideration as a whole such as the architecture of GPU and the characteristics of problem solving. With trial and error, the desired performance can be ultimately achieved. For the difference between GPU and CPU, the acceleration principles of GPU are analyzed, and the general-purpose computing model of current mature framework Compute Unified Device Architecture (CUDA) and its characteristics are discussed.(2) A image enhancement parallel algorithm based on multi-GPU acceleration for Wallis transform is proposed. With the help of the strong computing ability of GPU and the parallel computing architecture of CUDA, the fast algorithm of image filter for Wallis transform is implemented on a Personal Computer. The method of large scale thread division is put forward along with the task division on GPU. Along with the use of shared memory and coalesced global memory access the algorithm is accelerated. Threads for the computation of the same computing subspace are properly synchronized by shared memory in thread block. It compares GPU's speed with CPU's for Wallis image transformation. The experimental result shows that Wallis transform parallel algorithm could get two orders of magnitude speedup. The method is excellent in real time processing ability. It accelerates processing speed of image enhancement process and reduces the computing time significantly.(3) Multi-device control parallel algorithm of Harris corner detection based on GPU is presented, so that time-consuming Gaussian image convolution filtering part during the whole image corner detection process can be implemented by many parallel threads. Finally, implementation of this Single Instruction Multiple Thread (SIMT) parallel algorithm using GPU mechanism of shared memory and constant memory and pinned host memory in CUDA is detailed. The experiments show that the parallel algorithm of Harris corner detection based on multi-GPU for the successful implementation of hardware acceleration is nearly 60 times faster than the traditional Harris corner detection algorithm implemented on CPU.(4) A fast correlation coefficient image matching parallel algorithm is presented based on architecture of CUDA. The algorithm can execute high performance parallel computing in SIMT Pattern. On the basis of the parallel architecture and hardware characteristic of GPU, the parallel algorithm introduces three speedup methods to improve the implementation performance:execution configuration technology, high-speed storage technology and global storage technology optimizes the data storage structure and improves the data access efficiency. The experiment result shows that parallel algorithm takes full advantage of GPU's parallel processing capability and obtain the highest Multiprocessor Warp Occupancy, processing speed is nearly 20 times faster than CPU-based implementation.(5) Parallel algorithm of Scale Invariant Feature Transform (SIFT) feature matching for manycore architecture of CPU and GPU is proposed, which optimized the data storage structure and enhanced the data accessing efficiency. Experimental results show that the CUDA implementation can achieve more than 27 times speedup in comparison with serial CPU implementation of SIFT feature matching. By virtue of GPU, the real-time processing ability of SIFT feature matching algorithm can be greatly improved in practical application.(6) CPU and GPU-based image matching system integration. Including single GPU/multi-GPU accelerating Wallis-Harris-correlation coefficient(WHR) image matching system and single GPU/multi-GPU accelerating Wallis-SIFT(WS) image matching system. Experimental results show that GPU implementing WHR image matching system achieves up to a 37 times speedup over the CPU version, GPU implementing WS image matching system achieves up to a 39 times speedup over the CPU version.
Keywords/Search Tags:image matching, Wallis transform, Harris corner detection, correlation coefficient, Scale Invariant Feature Transform feature, parallel computing, Graphic Processing Unit, Compute Unified Device Architecture
PDF Full Text Request
Related items