Font Size: a A A

Research On Deep Learning Technology Of Target Detection For Heterogeneous Computing

Posted on:2022-05-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:1488306353476124Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image target detection has a wide range of applications in the fields of remote sensing images,Autopilot,medical diagnosis,object tracking,and vehicle traffic detection.It is an important research direction of digital image processing.Image target detection that is motivated by deep learning relies mainly on models of convolutional neural network to retrieve image features from a large bunch of datasets for classification detection.Although the effect of deep learning target detection is better than that of traditional detection algorithms in many applied cases,excessive parameters of the convolutional network model lead to a large amount of calculation whose model requires high hardware speed,making it difficult to run on embedded mobile terminals.Against this backdrop,this thesis focuses on the key technological issues of deep learning in image target detection in a heterogeneous computing environment,including lightweight deep learning network model of target detection,time evaluation of computing transmission during model training,loop unrolling optimization of training model on GPU,and hardware acceleration of computing model based on FPGA template structure.Firstly,in view of the high computational complexity of the target detection deep learning network model,this thesis proposes a lightweight target detection network model based on depthwise separable convolution method.On the basis of the Tiny YOLO network model,the model is optimized by deep convolution separable method.Following optimization,the sum of network model parameters is greatly reduced,and the speed of detection is significantly enhanced on the condition that the accuracy of detection remains basically invariable.Accordingly,channel compression is performed on the model.By analyzing and quantifying the sum of network parameters and the theoretical calculation amount of all channels in the model,the redundant channels with little impact on the performance of the network model are screened and deleted.Therefore,the network model is further simplified.Moreover,the compressed network model is significantly reduced in terms of weight file and convolution operation time,and the detection rate is accelerated,which cater for the needs of embedded mobile terminal in target detection.Secondly,in view of the problem that the data computation transmission time is long in convolutional neural network training,which is not conducive to the rapid iteration model optimization,a modeling evaluation method is proposed for data computation transmission time in CPU-GPU heterogeneous computing environment.For the devices with no implicit and dual copy engines,this thesis models and analyzes the time consumption of data calculation and data transmission in the communication between the host and the device.In data calculation,the model takes into full consideration the GPU chip resources and the execution time of the Kernel.In data transmission,the process of Log GP model data transmission evaluation is improved according to the access mechanism.In data overall communication,the thesis analyzes the communication time consumption between the host and the device under different communication scenarios and depicts the corresponding time consumption model.Through the data calculation and transmission time modeling evaluation,the communication time between the host and the device can be estimated more accurately,and the bottleneck of the time consumption can be nailed down through analysis to facilitate the network model optimization.Thirdly,in order to handle the mutually restrained complex relationship of loop unrolling in deep learning network model training,which is also limited by GPU hardware architecture,programming model and on-chip resources,a GPU-oriented loop unrolling factor cost model is proposed.By studying the factors that affect loop unrolling,the thesis analyzes the impact of instruction level parallelism and GPU occupancy balance on loop unrolling.This model fully considers the effect of GPU hardware architecture parameters on loop unrolling constraints and literation time as well as execution performance.The number of cycles consumed in each loop body is analyzed to form the optimal loop unrolling configuration scheme,and the optimal loop unrolling factor is found to effectively reduce the training time of the deep learning model in the GPU with improvement in the execution efficiency.Finally,in view of the problem that it is difficult to generate efficient hardware logical structure in the hardware acceleration design of FPGA,an Open CL cyclic flow computing model oriented to FPGA deep learning is proposed,which can quickly generate an efficient hardware logical structure.The model is robust to optimize the time-consuming convolution layer in deep learning and design parameters reasonably without the help of third-party tools,quickly reducing iterative Open CL codes.The high-efficiency hardware logical structure is generated on the basis of the cyclic flow computing model,which effectively solves the problems of repeating reading/writing data from the FPGA off-chip global memory and low access efficiency.Accessible to being applied in a diverse scale of convolutional layer,the model is optimally improved in term of execution performance.
Keywords/Search Tags:Deep learning, heterogeneous computing, target detection, convolutional neural network, model optimization
PDF Full Text Request
Related items