Font Size: a A A

Design And Implementation Of VGG Convolution Network On Multi-core Vector Processor

Posted on:2021-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:L WuFull Text:PDF
GTID:2518306047986049Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence and machine learning,as a representative algorithm of deep learning,convolutional neural networks can easily achieve the tasks of image recognition and classification.It is considered as the most effective image processing methods at present,and has been widely used in natural language processing,computer vision and other fields.With the improvement of recognition accuracy requirements in image classification,researchers have proposed more complex convolutional neural network models,lead to the increasing of parameters and calculations.Convolutional neural networks require higher computational performance and bandwidth of data storage for the processor.Traditional processors have been unable to meet the requirements of convolutional neural networks,that promote the development of processor architecture towards multi-core,many-core,heterogeneous GPU,embedded chips and other directions.It is a hotspot in academia about accelerate the calculation of convolutional neural networks according to the architecture characteristics of processors.This paper is based on the development status of convolutional neural networks and correlation processors in local and overseas.Systematically compares and analyzes the advantages and disadvantages of various accelerators,and discuss the main factors affecting the performance of the algorithm and the optimization method of assembly code.Then,an efficient implementation and optimization method of VGG neural network model based on multi-core parallel and vectorization is proposed in this paper aiming at the architecture characteristics of the multi-core vector accelerator Matrix2.The main research work of this paper is as follows:Firstly,a series of algorithm optimization methods are proposed for the VGG network model:(1)According to the characteristics of Matrix2 vector processor architecture,the multidimensional convolution calculation is converted into efficient vectorization matrix multiplication and vectorized calculation method based on row calculation is designed to achieve efficient usage of FMAC.(2)According to the characteristics of the VGG16 network model,the optimized pooling and full connection layer vector-quantization methods are designed by using the methods of data layout and software pipelining.(3)According to the calculation and data transmission characteristics of convolutional layer,pooling layer,and fully connected layer,corresponding optimized data transmission method based on DMA double buffer is designed,which overlap the calculation time and the data transmission time to the maximum,and effectively improve the overall calculation performance of the network model.(4)According to the characteristics of convolution kernel parameter sharing of the VGG16 network model,an efficient multi-core parallel VGG16 network model implementation method is designed by means of Multi-core data layout,DDR peripheral address division,data synchronization batch processing,and multicast DMA.Secondly,this paper uses the GCC compiler to generate the input data of the neurons in each layer of the VGG16 network,and then builds the VGG16 network on GCC to calculate the corresponding output.In addition,the Matrix2 software simulation environment built by NC-Verilog was used in the Linux operating system to simulate and debug the VGG16 network model assembly code,to ensure the uniqueness of the output of the two simulation environments.Finally,VGG16 network model was mapped on Matrix2 processor,and implements image recognition with the help of the trained convolution kernel to simulate the real environment.The results show that the optimization method of VGG neural network model proposed in this paper speeding up kernel computation efficiency to 93% and an overall performance is 115 frames per second.In summary,the design and optimization method of multi-core parallel VGG16 network proposed in this paper has achieved high computational efficiency,which has positive reference significance for the transplantation of other deep learning algorithms.
Keywords/Search Tags:Multi-core Vector Accelerator, Convolutional Neural Network, VGG16, Parallel, Vectorization
PDF Full Text Request
Related items