Font Size: a A A

Research And Implementation Of Image Recognition Acceleration Algorithm Based On Heterogeneous Platform

Posted on:2021-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:S Q HuFull Text:PDF
GTID:2428330611951371Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The research and application of artificial intelligence and high-performance heterogeneous computing in video big data processing are of great significance,and the efficient deployment of algorithms on specific computing platforms has great challenges.This dissertation is based on the background of the smart city intelligent monitoring application and aimed at problems like a large amount of deep learning calculation,high real-time requirements of video recognition industry applications,complex and diverse heterogeneous platforms,etc.The three aspects of image recognition algorithms,heterogeneous computing and parallel programming and algorithm optimization acceleration are deeply researched and practiced.Comparative analysis of classic image recognition algorithms,clarifying the important position of YOLOv3 algorithm model in engineering deployment.Focus on researching the algorithm model,frame structure and core computing module,theoretical analysis and experiment to verify its computing performance.The CPU + GPU heterogeneous computing platform and programming technology are studied,and a collaborative computing scheme based on the OpenCL unified parallel programming model is proposed,using parallel acceleration strategies such as loop unrolling,vectorization,data rearrangement,multi-thread parallelism,and memory access optimization.Based on the DarkNet open-source deep learning framework,the OpenCL is used to implement the convolution layer,pooling layer,batch normalization layer and activation layer kernel function.Use end-to-end automatic optimization compilation stack TVM,rewrite advanced data flow,generate optimized calculation graph,combine tensor description and target hardware optimization primitives,generate possible optimized scheduling space,build machine learning cost model automatic scheduling optimizer,and XGBoost is used to train and optimize the model,and the optimized model and parameters are generated into library files for deployment by CPU/GPU devices.Experiments show that on the CPU + GPU heterogeneous computing platform,the YOLOv3 algorithm has a speedup of 271.8 times relative to the CPU and a speedup of 1.6times relative to the GPU.The optimized deployment of TVM is 4.62 times faster than the CPU and 1.46 times faster than the GPU.The application platform limitation of the originalalgorithm has been broken,the platform computing performance and running real-time performance of the algorithm have been improved,and the problem of code reconstruction and performance optimization across multiple hardware devices have been solved.It is of great significance for the transplantation of algorithm functions and performance on complex and diverse heterogeneous platforms.
Keywords/Search Tags:Image Recognition, Heterogeneous Computing, Parallel Acceleration, Automatic Optimization, OpenCL
PDF Full Text Request
Related items