Research On Object Detection Network Algorithm Accelerating Technology Based On ARM NEON

Posted on:2021-05-11

Degree:Master

Type:Thesis

Country:China

Candidate:J Xing

Full Text:PDF

GTID:2428330614950145

Subject:Electrical engineering

Abstract/Summary:

PDF Full Text Request

In recent years,deep learning has been successfully applied in target detection,image recognition,speech recognition,natural language processing a nd other fields.Due to the large amount of computing resources,these applications are mainly run in the enterprise's server cluster and other high performance computing equipment.With the advent of the Internet of Things era,deep learning needs to be implemented in a complex practical environment,and the original software and hardware solutions cannot adapt to the complex and changeable practical projects with high real-time requirements.Finding new software and hardware solutions for deep learning applications is the focus of our research.As the most common mobile processor,ARM has the advantages of short development cycle.ARM NEON instruction set was first used in computation-intensive applications such as multimedia processing.Its single-instruction multi-data processing method is suitable for large-scale data computing,so it is applied in deep learning applications.This paper takes ARM Cortex-A55 as the development platform to study the accelerated implementation technology of target detection network with deep learning as the feature extraction layer.The main research contents are as follows:Firstly,deep learning and target detection network topology and deep learning acceleration technology are studied.Firstly,the concepts and basic ideas of deep learning are studied.Taking target detection network as the typical application background of deep learning,the structural characteristics and technical difficulties of the network are analyzed in detail.In the direction of neural network acceleration,the current deep learning network lightweight method is studied.The software and hardware solutions for deep learning application in this paper are obtained.Secondly,high speed parallel deep learning related basic operations are completed.Taking ARM as the hardware development platform,the application of semi-precision floating point number,design of basic loop,design of matrix multiplication,fast exponential function calculation and other basic operations are realized.And design the cache optimization scheme for ARM processor to improve the time and space locality of the program.It provides basic computing support for the design of real-time target detection network.Thirdly,the optimization design of the target detection network is r ealized.Complete the design of each layer of deep learning,including the real-time implementation of convolution,pooling,full connection and SOFTMAX layer.In terms of convolution,two calculation methods are realized and compared.Among them,im G2 COL method has advantages in cache compared with direct calculation method,and Winograd method has advantages in algorithm complexity.In the pooling layer,a new model data storage structure is designed.Finally,the whole network is constructed and the forward propagation process is designed.Finally,the experimental platform is built.According to the existing experimental conditions,the hardware and software environment of the verification system was established.Complete the comparison experiment on the development board of ARM Cortex-A55 chip to complete the forward propagation of the target detection network.The design experiment verification method verifies the design network in this paper,and analyzes the whole aspect of computing performance,access performance,operator performance and target detection network,and concludes that the design scheme in this paper has advantages in computing performance.

Keywords/Search Tags:

deep learning, object detection, NEON vectorization, SIMD, aceleration technology

PDF Full Text Request

Related items

1	Research Of SIMD Vectorization Algorithm And Regrouping Technology
2	Research On SIMD Auto-vectorization Optimization Technologies
3	Model Compression And Forward Acceleration Based On Embedded Deep Neural Network
4	Research On Automatic SIMD Vectorization Recognization And Code Tuning Technology
5	Research On SIMD Vectorization And Optimization Of Non-Multimedia Applications
6	Research On Vectorization Method For SIMD Super-long Vector Acceleration Components
7	Research On Auto-Vectorization Compiling Techniques Oriented To Irregular Applications On SIMD Extension
8	Research And Implementation Of Robot Drawing Comics Technology Based On Deep Learning
9	Research On Vectorization Technology For Multi-cluster And VLIW DSP
10	Research On Profile-Guided SIMD Vectorization Identification And Optimization