Research And Implementation Of Image Recognition Acceleration Algorithm Based On Heterogeneous Platform

Posted on:2021-05-22

Degree:Master

Type:Thesis

Country:China

Candidate:S Q Hu

Full Text:PDF

GTID:2428330611951371

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The research and application of artificial intelligence and high-performance heterogeneous computing in video big data processing are of great significance,and the efficient deployment of algorithms on specific computing platforms has great challenges.This dissertation is based on the background of the smart city intelligent monitoring application and aimed at problems like a large amount of deep learning calculation,high real-time requirements of video recognition industry applications,complex and diverse heterogeneous platforms,etc.The three aspects of image recognition algorithms,heterogeneous computing and parallel programming and algorithm optimization acceleration are deeply researched and practiced.Comparative analysis of classic image recognition algorithms,clarifying the important position of YOLOv3 algorithm model in engineering deployment.Focus on researching the algorithm model,frame structure and core computing module,theoretical analysis and experiment to verify its computing performance.The CPU + GPU heterogeneous computing platform and programming technology are studied,and a collaborative computing scheme based on the OpenCL unified parallel programming model is proposed,using parallel acceleration strategies such as loop unrolling,vectorization,data rearrangement,multi-thread parallelism,and memory access optimization.Based on the DarkNet open-source deep learning framework,the OpenCL is used to implement the convolution layer,pooling layer,batch normalization layer and activation layer kernel function.Use end-to-end automatic optimization compilation stack TVM,rewrite advanced data flow,generate optimized calculation graph,combine tensor description and target hardware optimization primitives,generate possible optimized scheduling space,build machine learning cost model automatic scheduling optimizer,and XGBoost is used to train and optimize the model,and the optimized model and parameters are generated into library files for deployment by CPU/GPU devices.Experiments show that on the CPU + GPU heterogeneous computing platform,the YOLOv3 algorithm has a speedup of 271.8 times relative to the CPU and a speedup of 1.6times relative to the GPU.The optimized deployment of TVM is 4.62 times faster than the CPU and 1.46 times faster than the GPU.The application platform limitation of the originalalgorithm has been broken,the platform computing performance and running real-time performance of the algorithm have been improved,and the problem of code reconstruction and performance optimization across multiple hardware devices have been solved.It is of great significance for the transplantation of algorithm functions and performance on complex and diverse heterogeneous platforms.

Keywords/Search Tags:

Image Recognition, Heterogeneous Computing, Parallel Acceleration, Automatic Optimization, OpenCL

PDF Full Text Request

Related items

1	Parallel Analysis And Acceleration Method Of AES Algorithm Based On OpenCL
2	The Research And Implement Of Video Image Recognition Based On Heterogeneous Computing Platform
3	Research On Parallel Face Recognition Based On OpenCL Acceleration
4	Design And Implementation Of Advertising Image Recognition System Based On Heterogeneous Computing
5	Investigation Of Heterogeneous Acceleration Method For Image Dehazing Based On FPGA+CPU
6	Research And Application Of Multi-GPU Parallel Computing Based On OpenCL
7	Optimization Of Defogging Algorithms And FPGA Acceleration Method Based On Heterogeneous Atmosphere Light Prior
8	The Graphing Of Mathematics Image Base On Heterogeneous Computing With OpenCL
9	Research Of FPGA Heterogeneous Computing Method Based On OpenCL
10	Parallel Accelerated Implementation Of Image Dehazing Algorithm Based On OpenCL