Font Size: a A A

Convolutional Neural Network Artificial Intelligence Chip Research On Weight Compression And Fast Data Buffering

Posted on:2022-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z WangFull Text:PDF
GTID:2518306602465264Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the application and development of artificial intelligence algorithms in medical,financial,security,education,transportation and other fields,artificial intelligence computing power has become a benchmark to evaluate the performance of equipment,and more and more enterprises are developing special chips for artificial intelligence.According to different uses,the complexity of artificial intelligence algorithm and other aspects of the research,the design of artificial intelligence chip focus is also different.For example,artificial intelligence chips for big data research algorithms focus on computing performance and data throughput,and intelligent wearable devices focus on low power consumption and chip area.For low-power,small area AI computing chips,The AI chip designed and developed by the internship unit is oriented to low power consumption and wearable application scenarios,focusing on improving performance on the basis of low power consumption and small area.Based on UMC 65 nm process,the first version of the chip achieves a computing efficiency of 426 GMACS/w,an area of 4mm2,a power of 6m W,and a maximum of 38000 classified calculations per second.Now,on the basis of the original design,we add regional buffer,weight compression and other technologies to improve the computing power of the chip.The power consumption of the chip design is less than 2W,which is tens of millions of gates.Data communication within the chip has always been the bottleneck to improve the performance of von Neumann architecture chip.Convolutional neural network has a large number of weight data,so it needs a lot of memory and data bus bandwidth.Mobile chips are often limited by chip area and power consumption,and do not have large on-chip cache and data bus bandwidth like GPU and CPU.Therefore,model compression technology is used to reduce the total amount of parameters and calculation of convolutional neural network.Mainstream model compression methods,such as model pruning,model quantization,model decomposition,improve the computational performance of the model in terms of the amount of calculation or the total amount of model parameters.At present,most of the research work is based on the results of software platform simulation.To design a special chip that supports the compression format weight file can further reduce the power consumption,reduce the chip computing time,and improve the economic benefits.In this paper,we improve the existing model clipping technology,so that the weight file after clipping can save the time of multi-layer software conversion,and be used more efficiently by hardware resources such as memory and computing unit.This paper introduces two hardware implementations of model compression on reconfigurable AI chip.One is to design an efficient data buffer module to improve the overall computing performance by compressing the total amount of data read from memory or cache.The other is to compress the data size of the model by model quantization,which often leads to a great loss of data accuracy.The proposed cubic quantization calculation can improve the calculation accuracy of mobilenet model from double precision floating-point quantization to 8-bit fixed-point quantization,and ensure the accuracy of model calculation.Nowadays,the mainstream deployment is in the mobile terminal.Most of the light-weight networks are based on the mobile net framework.The weight of the image net classification network based on the architecture is about 37 MB.But for the chip of von Neumann architecture,the size of on-chip cache determines the performance of the chip.Increasing the cache size not only means increasing the chip area and power consumption,but also the cost of the chip.Therefore,we add efficient data buffer control mechanism and quantization method to improve the data bus bandwidth,and use int8 multiplier to reduce the data storage size,so as to reduce the on-chip cache size.Therefore,there are two problems in the design: 1.How to buffer data efficiently;2.How to guarantee the accuracy when quantizing to 8-bit fixed point.Compared with CPU and GPU,the data bus bandwidth of the chip is increased by 80% and the computing time is reduced by 10 x.The precision loss of double precision floating-point to 8-bit fixed-point data uses cubic quantization,that is,the data is multiplied three times.Through pipeline,the calculation time of each batch of data is only increased by 5 cycles,the final classification result is less than 5% different from the double precision floating-point,and the weight data is only one eighth of the original data.In order to verify the accuracy of the quantitative calculation and data accuracy,we also designed a set of high-level and software tool chain from data preparation,instruction generation,result prediction and optimization.The total computing time of mobile net's classification network is 46ms.
Keywords/Search Tags:Reconfigurable artificial intelligence chip, convolutional neural network, data buffer, weight compression
PDF Full Text Request
Related items