Font Size: a A A

Optimal Design Of Multiplication Array Circuit For Neural Network Application

Posted on:2024-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:W J LiFull Text:PDF
GTID:2568307097958099Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the wide application of convolutional neural network(CNN),the amount of data and computational complexity of its model have become larger and larger,which also puts forward higher requirements for the hardware acceleration design of CNN.The Multiply Accumulate array as a circuit module for implementing convolution operations,undertakes most of the hardware consumption in the entire network,and its performance and power consumption have a crucial impact on the entire network.In this paper,the data multiplexing characteristics of the convolution operation and the structure of the Booth algorithm multiplication circuit are studied,and a high-energy-efficiency MAC array based on radix-16 Booth algorithm pre-computing circuit and input register multiplexing is designed and implemented to solve the application requirements of high energy efficiency of MAC array.Firstly,the generation of the partial product of the radix-16 Booth algorithm is analyzed,and its circuit module is designed and implemented.Afterwards,a circuit multiplexing structure in MAC array is proposed by combining the data multiplexing characteristics of the convolution operation with the structure characteristics of the partial product generation circuit.Precomputing circuit modules and corresponding input registers are multiplexed in the MAC array to optimize overall circuit power consumption;Secondly,the Wallace Tree compression structure is designed to implement the accumulation operation between multiple partial products.In this compression structure,in order to avoid redundant calculations in the partial product supplementary addition operation of the conventional multiplier,the high and low partial products output by multiple selection circuit modules are directly sent to the compression tree for compression.The high and low part products are compressed separately,and the supplementary addition operation is performed after the compression is completed,thereby reducing the hardware overhead of the Wallace Tree.Thirdly,in the logic synthesis stage of the MAC array,through the analysis of the MAC array structure and the used process library model,a power consumption optimization synthesis process based on the switching activity is proposed according to the characteristics of this design,so that the power consumption of the MAC array is further optimized.Finally,The physical design of the MAC array is completed,and the corresponding solutions are proposed for the problems in each stage during the implementation.The verification work has been completed,and the optimization and closure of the MAC array in terms of performance,area and power consumption have been realized.With the above optimization,a highly energy efficient MAC array is implemented.The PR of the design is completed under the SMIC40nm process,with a final area of 2.98mm2.When processing Convl in the convolutional layer of ResNet18 network,the energy efficiency reached 2.41TOPS/W,and the area efficiency reaches 206GOPS/mm2.Compared with other current CNN accelerators,it has an increase of 14.2%and 26.2%in energy efficiency.It can be seen that the MAC array designed this time is the core of the convolution operation,and has higher performance in the current convolution operation structure.has certain advantages.performance and certain advantages in the current convolution operation structure.
Keywords/Search Tags:MAC array, Booth algorithm, Wallace Tree, Logic synthesis, Physical implementation
PDF Full Text Request
Related items