Research On Deep-learning Optimization Technology Based On ARM Processor

Posted on:2022-11-13

Degree:Master

Type:Thesis

Country:China

Candidate:J J Luo

Full Text:PDF

GTID:2518306764977159

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

The deployment of Deep-learning models has gradually migrated from the original large server array of X86+GPU to small ARM devices.A Deep-learning neural network has many layers,and a single layer contains a large amount of data.The requirement of real-time and accuracy poses a great challenge to the computing power of the hardware.Compared with X86 devices and GPUs of the same period,ARM devices have the problems of lower frequency and slower calculation speed,and the number of cores,cache size,number of stages and memory space are often inferior to the latter two.These problems make deploying models on ARM devices suffer from poor real-time performance.Therefore,this thesis proposes a Deep-learning optimization method based on ARM processor,which optimizes the model calculation rate and storage space usage.This thesis starts with the operator in neural network,optimizes the GEMM operator and the two-dimensional convolution operator.First,choosing the ARM Cortex-A72 architecture and analyze the cache characteristics.Then,analyze the characteristics of GEMM operator,and use decouple the calculation and data scheduling on this basis,and use the block strategy to optimize the cache usage.The ARM NEON instruction set is used to accelerate calculations and reduce the amount of R/W operations.Then,according to the pipeline architecture characteristics of the ARM processor,instruction rearrangement is used to further optimize the pipeline utilization.Secondly,expand the data of the two-dimensional convolution operator into row vectors,which changes the calculation form to GEMM calculation,then it is optimized using a similar strategy to the GEMM operator.After the optimization of calculation rate and storage efficiency of the two operators,combining with the TVM framework of the Deep-learning compiler,the two operators are integrated into the TVM,so that they can participate in the compilation and deployment of the actual Deep-learning model,which optimize the neural network.Finally,the experiment results show that the optimization method proposed in this thesis can effectively improve the operator and cache hit rate and calculation speed of the neural network in the calculation process,which verifies the effectiveness of the optimization method in this thesis based on ARM devices.

Keywords/Search Tags:

Embedded devices, NEON, Operator optimization, Deep-learning compiler

PDF Full Text Request

Related items

1	Study Of ParaC Compiler Supporting Deep Learning Operator Parallel Algorithm Optimization
2	Model Compression And Forward Acceleration Based On Embedded Deep Neural Network
3	Construction And Optimization Of Deep Learning Operator Library Based On Loongson Platform
4	Research On Object Detection Network Algorithm Accelerating Technology Based On ARM NEON
5	Smartlcc: Compiler For Embedded Systems Research And Development
6	Research Of Automatic Compiler Tuning Base On Machine Learning
7	Embedded Msdcc Heterogeneous Multi-core Compiler
8	Research On Compiler Adaption With Machine Learning
9	Deep Learning Model Acceleration And Embedded System Implementation
10	A verifying compiler for embedded networked systems