Font Size: a A A

Research On Parallel Algorithm Design And Optimization For Image Matching With Heterogeneous Architectures

Posted on:2017-06-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:X X TangFull Text:PDF
GTID:1368330590990810Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Traditionally,CPUs could improve their performance by increasing the core fre-quency.However,this mode turned out to be unsustainable due to the famous "power wall".In this case,parallel architectures become the new trend for computer proces-sors.Driven by various applications and the limitation of energy consumption,hetero-geneous parallel processors become more and more popular.Among them,multicore CPUs,manycore GPUs and co-processor Xeon Phi are typical representatives.Com-pared with serial processors,the parallel ones usually contain much more computing cores,which contribute a lot to their peak performance.However,during the programming of heterogeneous computing,programmers face great challenges:firstly,with the increasing peak performance of heterogeneous architectures,any level within the system could become new performance bottlenecks that can lead to poor performance;secondly,to keep a good balance between perfor-mance and energy consumption,the systems become highly heterogeneous that can make software migration between the different computing architectures a complicated job;thirdly,driven by the various demands of applications,programmers have to learn to use several architectures at the same time and make them work cooperatively,which would further increase the difficulties of heterogeneous programming.Due to the above reasons,heterogeneous computing is still very challenging for most programmers.To better utilize the massive computing power of heterogeneous architectures,we need to make new designs and optimizations to the software,especially to their core algorithms.In this thesis,we have studied several image matching problems with het-erogeneous computing and the main achievements are listed as follows:·We have designed and implemented a highly space-efficient and easy-to-parallel pixel-based image matching algorithm:Pixel-based image matching algorithms usually involve both high time complexity and high space complexity.In this work,we focus on reducing its space complexity and improving its parallel per-formance.We propose a new pixel-based multi-anchor algorithm for image matching,which greatly reduces its memory consumption so that it can fit in different architectures.Then we propose both fine-grained and coarse-grained ways to parallelize the algorithm on multicore CPUs.Based on the SIMT model of GPU,we modify its data structures so that it can better utilize the high band-width of GPU memory.Experimental results show that these optimizations are very efficient in improving the performance of our new algorithm.·We have summarized several general performance bottlenecks on heterogeneous architectures for feature-based image matching algorithms:feature-based image matching algorithms have been widely used in machine learning and computer vision.In this work,we focus on performance analysis by testing several match-ing algorithms with different image data on multiple heterogeneous architectures.During the test,we find several performance bottlenecks,e.g.,low utilization of last-level cache,contention on memory accesses,and contention on system calls,etc.Based on the data collected during runtime,we are able to find the real reasons for these bottlenecks and propose a Divide-and-Merge method to avoid them.Experimental results show that this method is general and efficient in improving the performance of feature-based matching algorithms.·We have designed and implemented a feature filtering algorithm which can dy-namically adjust its precision:Approximate algorithms are very popular in fea-ture matching problems.However,how to achieve both good parallel perfor-mance and good precision is still a hard task in recent research.In this work,we focus on improving parallel performance while maintaining a good precision of the related algorithms.We propose a general feature filtering algorithm that can greatly reduce the computation and memory accesses during the feature match-ing process.We also design a method which can dynamically adjust the precision of the filtering process so that it can fit with different data distributions.Exper-imental results show that our new filtering algorithm is able to achieve a good balance between performance and precision on heterogeneous architectures.·We have designed and implemented an efficient k-selection algorithm on GPU:We find that k-selection takes the most amount of time during feature match-ing on GPU.In this work,we focus on designing a new k-selection algorithm based on the characteristics of the GPU architecture and k-selection process.A new merge queue data structure is proposed to efficiently maintain the k smallest values during the search;a buffered search technique is proposed to reduce the performance influence caused by branch operations within threads in the same warp;a hierarchical partition method is proposed to reduce the amount of values to be searched.Experimental results show that these new methods can signifi-cantly reduce the amount of time taken by k-selection on GPU.In summary,this thesis focuses on the problems of parallel algorithm analysis,optimizations,design and implementations for image matching problems with hetero-geneous architectures.To achieve these goals,we have summarized several general performance bottlenecks and optimization techniques appeared in our case studies;we have also designed new algorithms that can fit in different architectures with good performance.
Keywords/Search Tags:Heterogeneous Computing, Parallel Algorithms, Multicore CPU, Manycore GPU, Coprocessor Xeon Phi, Performance Analysis, Pixel-based Matching, Feature-based Matching, Approximate Computing, Selection Algorithms
PDF Full Text Request
Related items