Font Size: a A A

Research Of Multi-core CPU And Many-core GPU Accelerated Parallel Optimization Algorithms

Posted on:2017-05-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:1368330512954954Subject:Computer applications
Abstract/Summary:PDF Full Text Request
Due to the constraint of semiconductor technology, power and Instruction-level parallelism, the performance of center processing units (CPUs) is enhanced by parallel processing architecture, e.g., multi-core. Meanwhile, graphic processing units (GPUs) have evolved from fixed function rendering devices to programmable and parallel processors. And the architecture of GPUs is redefined as many-core. The theoretical peak performance of parallel processors is dramatically increased, which brings opportunities for the development of large scale scientific and engineer computation. However, since the architecture of parallel processors is various and parallel programming is complex, algorithm parallelization that is suitable for specific parallel hardware is a big challenge.Since the emerging of high-level parallel programming model, e.g., OpenCL, CUDA and Direct Compute, parallel programming is less complex than before. And simply porting existing algorithms to new hardware is not enough as a scientific goal. But due to different characteristics of parallel architectures, computational features of parallel algorithms and discrepancies of the compiler optimizations, How to parallelize algrithms on a specific parallel arichitecture with optimized performance is a topic of interest.In order to solve the problems in parallel algorithm optimization on a specific parallel arichitecture, this thesis is based on the multi-core CPU and many-core GPU architectures to research parallel optimization algorithms. The research methods are both theory and experiment. On one hand, the multi-core CPU and many-core GPU architectures are investigated and analyzed. And the parallel implementation issues on these architectures are also researched. On the other hand, the optimization approaches of parallel algorithms on these architectures are studied. And the effectiveness of the approaches is validated with public test datasets. The work and major contributions are as follows:(1) Dynamic strategy based Parallel Ant Colony Optimization on GPUsExisting GPU-based ant colony optimization algorithms (ACOs) face on-chip memory limitations and not significant speedups toward CPU. We propse a dynamic strategy based ACO on GPUs, which could solve larger problems than existing algorithms and is optimized with improved efficiency.(2) Parallel ACO on multicore-SIMD CPUsWe futher study parallel ACO models on multicore-SIMD CPUs. Based on the traditional task parallel ACO model, we propose vectorized ACO models and their performance issues. We compare our CPU-based ACO algorithm with existing high performance GPU ACOs, and the results demonstrate that the CPU-based algirthm is better than GPU-based one. Considering the theoratic peak performance of the GPU is far beyond the CPU, we could deduce that multicore-SIMD CPU is more suitable for irregular and random-based algorithms than the GPU.(3) Optimization of parallel iterated local search (ILS) algorithms on GPUsWe propose an optimizaiton approach with quantitative performance analysis for parallel ILS algorithms. ILS algorithm is a typical single-solution based metaheuristic algorithm. We research the parallization of ILS on GPUs and ultilize a quantitative performance analysis model to identify the bound factor of a GPU-based ILS algorithm. After we optimize the algorithm according to the bound factor. The experimental results show the effectiveness of our approach. To be general, our method could guide the optimization for other parallel algorithms.(4) CPU-GPU collaborative computing image convolution filtering algorithmWe research a one of the most important algorithm in image processing. We propose a CPU-GPU collaborative computing model for parallel image convolution algorithm. Two collaborative approaches, static task assignment and dynamic task assignment, are presented and evaluated. The existing GPU-parallel image convolution algorithms ignore the high performance computing feature of the modern CPU, which causes many CPU cores in idle status. Besides, a discrete GPU communicates with a CPU through PCI-E bus, which introduces considerable overhead. Therefore, we exploit both CPU and GPU in parallel image convolution computation. The experimental results demonstrate this approach could ultilize both the CPU and GPU computing power, and it has a strong potential.
Keywords/Search Tags:parallel computing, multi-core CPU, many-core GPU, metaheuristics, ant colony optimzation
PDF Full Text Request
Related items