Font Size: a A A

Research On Deep-learning Optimization Technology Based On ARM Processor

Posted on:2022-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:J J LuoFull Text:PDF
GTID:2518306764977159Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
The deployment of Deep-learning models has gradually migrated from the original large server array of X86+GPU to small ARM devices.A Deep-learning neural network has many layers,and a single layer contains a large amount of data.The requirement of real-time and accuracy poses a great challenge to the computing power of the hardware.Compared with X86 devices and GPUs of the same period,ARM devices have the problems of lower frequency and slower calculation speed,and the number of cores,cache size,number of stages and memory space are often inferior to the latter two.These problems make deploying models on ARM devices suffer from poor real-time performance.Therefore,this thesis proposes a Deep-learning optimization method based on ARM processor,which optimizes the model calculation rate and storage space usage.This thesis starts with the operator in neural network,optimizes the GEMM operator and the two-dimensional convolution operator.First,choosing the ARM Cortex-A72 architecture and analyze the cache characteristics.Then,analyze the characteristics of GEMM operator,and use decouple the calculation and data scheduling on this basis,and use the block strategy to optimize the cache usage.The ARM NEON instruction set is used to accelerate calculations and reduce the amount of R/W operations.Then,according to the pipeline architecture characteristics of the ARM processor,instruction rearrangement is used to further optimize the pipeline utilization.Secondly,expand the data of the two-dimensional convolution operator into row vectors,which changes the calculation form to GEMM calculation,then it is optimized using a similar strategy to the GEMM operator.After the optimization of calculation rate and storage efficiency of the two operators,combining with the TVM framework of the Deep-learning compiler,the two operators are integrated into the TVM,so that they can participate in the compilation and deployment of the actual Deep-learning model,which optimize the neural network.Finally,the experiment results show that the optimization method proposed in this thesis can effectively improve the operator and cache hit rate and calculation speed of the neural network in the calculation process,which verifies the effectiveness of the optimization method in this thesis based on ARM devices.
Keywords/Search Tags:Embedded devices, NEON, Operator optimization, Deep-learning compiler
PDF Full Text Request
Related items