Embedded Image Recognition System And Optimization Based On Convolutional Neural Network

Posted on:2020-01-17

Degree:Master

Type:Thesis

Country:China

Candidate:R Wang

Full Text:PDF

GTID:2428330602450786

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

This paper mainly introduces the research work of convolutional neural network based on the data parallel DSP processor combining multi-core,SIMD,and VLIW technology,and realizes image classification on embedded devices.The main content is combined processing.The architecture features of the main algorithm for the convolutional neural network image classification on the FT-M7002,combined with specific five types of target images^[1]?oil depot,airport,dam,railway,tower?and corresponding The convolutional neural network classification model verifies the accuracy and real-time performance of image classification on FT-M7002.The main optimization methods include:storage level parallel memory access optimization,instruction parallel optimization,and data level parallelism.The storage access optimization is based on the DSP multi-layer storage architecture,including:according to the size of on-chip off-chip storage resources,combined with cache related knowledge,improve the hit rate of memory access,reduce the memory access cycle;reduce CPU,DMA parallel memory conflicts,improve Access memory bandwidth utilization,according to the characteristics of the algorithm,through reasonable data movement and data storage location,reducing data calculation and program running time.The instruction set parallel optimization is based on the VLIW technology in the DSP architecture.The purpose is to enable the same shot to execute more instructions,so that the functions of multiple independent functional units can be fully utilized,and the CPU can be reduced by loop expansion,soft flow,and CPU.The idle cycle enables parallel transmission of multiple instructions to improve performance.Data-level parallelism is based on the SIMD technology in the DSP architecture.By using vector programming technology,the data parallel processing capability of the vector unit is fully utilized to complete the parallel processing of data.Finally,combined with multi-core parallelism,multi-core execution of computing tasks is realized,and the program and The computing speed of the data finally realizes the real-time requirements of image classification based on embedded convolutional neural networks.The main work of this paper is:1.Research FT-M7002 hardware and software platform,including FT-M7002 architecture,vector C instruction set,assembly instruction set,and FT-M7002 integrated development environment.2.Study the algorithm of convolutional neural network image classification in darknet framework,combine the five types of target image classification models,find out the parts that can be transplanted and optimized on FT-M7002,and optimize the implementation of code and algorithm by combining software and hardware knowledge.Improve image classification and recognition performance.3.Perform vectorization transformation on various algorithms?convolution,pooling,normalization function,addition bias function,activation function,etc.?,exploit data parallelism in vector calculation,and calculate the speedup ratio.Through the compiler optimization option,combined with the double buffer idea,hide part of the DMA and data calculation in the implementation process,reduce the CPU and DMA memory conflict according to the vector memory space addressing mode,and use storage optimization to reduce unnecessary accesses.Operation,use open cache to improve access speed,loop expansion and other methods to improve recognition performance.4.Combining the structural characteristics of FT-M7002 and the peak performance of theoretical performance,the instruction level and data level are paralleled by assembly,the DSP idle cycle is reduced,and some defects of vector C instruction are overcome to realize the full utilization of FT-M7002 execution units.Multi-core parallelism,complete performance optimization,give general optimization method for image classification and part of target detection on FT-M7002,especially for applications with higher real-time requirements,give algorithm implementation reference method,according to this Steps to get better performance requirements.

Keywords/Search Tags:

SIMD, vectorization, assembly, VLIW, image classification, speedup ratio, storage optimization, FT-M7002

PDF Full Text Request

Related items

1	Research On Vectorization Technology For Multi-cluster And VLIW DSP
2	OpenCV Transplantation And Optimization Based On FT-M7002
3	Research Of SIMD Vectorization Algorithm And Regrouping Technology
4	Research On SIMD Auto-vectorization Optimization Technologies
5	Research On SIMD Vectorization And Optimization Of Non-Multimedia Applications
6	Research On SIMD Vectorization Of Loop Nests And Its Optimization Techniques
7	Research On Profile-Guided SIMD Vectorization Identification And Optimization
8	YHFT-Matrix2 Compiler’s Technologies Related To SIMD Optimization Research And Implementation
9	Research And Implementation Of SIMD Compilation Optimization For BWDSP
10	Research On Vectorization Method For SIMD Super-long Vector Acceleration Components