Font Size: a A A

Research On NOR Flash-Based Energy-Efficient In-Memory Neuromorphic Computing Architecture

Posted on:2021-04-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:R XuFull Text:PDF
GTID:1368330605979464Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
Various devices and mass data provided by edge computing and on-device infer-ence make it practical for neural networks to deeply influence the way we live and work today.However,contradiction between the requirement for enormous computing power of neural networks and the requirement for low power consumption of edge de-vices and end devices handicaps the development and application of the neural networks seriously.Low-power consumption,energy efficient neural network accelerators have become the focus of many enterprises and research institutions.Attempts have been made to implement such energy efficient neural network accelerators:performing merg-ing,compression,and pruning on neural networks,designing domain-specific architec-tures for neural networks,and developing new computing paradigms.Among these,a novel computing paradigm,i.e.,in-memory computing,inspired by neurosciences has attracted much attention due to its incomparable advantage on energy efficiency.Tra-ditional neural network accelerators based on digital logic circuits are mostly based on the von Neumann architecture,i.e.,a processing unit and a memory unit thereof are separated from each other.When computating data-intensive applications such as neu-ral networks,data and results have to be moved back and forth between the processing unit and the memory unit.This will make memory bandwidth a bottleneck of the ac-celerating system.Moreover,the back-and-forth movement of data is also one of the main power consumption sources of the accelerating system.In-memory computing is implemented based on non-volatile memories such as NOR flash memories and mem-ristors.In addition to being used for storing data,memory cells of the memory are-also used as processing units.Therefore,the in-memory computing technology is capable of avoiding drawbacks of the traditional digital accelerators that need to repeatedly move large quantities of data,thus avoiding the bandwidth bottleneck and energy consump-tion incurred by memory access.Thereby,in-memory computing has become one of the most promising implementations for low-power consumption,energy efficient neu-ral network accelerators.However,existing neural network-oriented in memory computing implementation schemes have many problems.First,many in-memory computing-based neural network accelerators are implemented based on fully-binarized neural networks.Such extremely compressed neural networks will introduce a large accuracy loss in complex applica-tions.Second,the existing in-memory computing accelerators based on multi-bit neural networks are mostly implemented by employing multi-level memory cells or multiple memory cells to achieve one multi-bit operation.This incurs greater area overhead,larger energy consumption,lower speed and higher cost,and is less competitive than binarization neural network-based accelerators.Third,because of the limitation on the digital process imposed by the memory process,it is difficult to makes improvement of speed,energy consumption,area,and scale of digital circuits in in-memory computing accelerators,and it is impossible to benefit from the a more advanced digital process.Fourth,the existing accelerators are mostly aimed at deep convolutional neural net-works with structural regularity,and lack of support for emerging graph convolutional networks with irregular structures.The cache stragety does not work at all for graph convolutional netwroks.In view of the problems in the in-memory computing technol-ogy,this dissertation proposes a heterogeneous,hybrid-precision in-memory computing scheme to achieve a flexible and practical in-memory computing neural network accel-erating system with high accuracy and low power consumption,and performs optimiza-tion on cache stragety for irregularity of the emerging graph convolutional networks.The contribution of this dissertation is as follows:(1)In order to avoid problems of excessive accuracy loss in the in-memory com-puting accelerators based on binarized neural networks and to avoid higher resource and energy consumptions in the in-memory computing accelerators based on multi-bit neural networks,this dissertation proposes a hybrid-precision in-memory computing scheme,which optimizes neural network algorithms according to characteristics of in-memory computing on the basis of quantized and binarized neural networks and imple-ments a hybrid-precision neural network.Different from full precision neural networks using floating point numbers,or quantized neural networks using fixed-perision num-bers,or binarized neural networks using binarized numbers,a hybrid-precision neu-ral network quantizes the more important layers in a neural network to a higher preci-sion and binarizes most layers with less importance,and ensures a higher binarization ratio without substantial accuracy loss compared to the full-precision network,so as to performing computation in in-memory computing hardware and achieve almost the same energy efficiency and performance as in-memory computing accelerators based on binarized neural networks.Since most computation are binarized,energy efficient and high-performance,energy consumption is greatly reduced compared to in-memory computing accelerators based on fully multi-bit neural network.(2)To support the above hybrid-precision networks,this dissertation optimizes the existing in-memory computing architecture and proposes a hybrid-precision neural net-work accelerating architecture based on in-memory computing.This architecture imple-ments both binarized in-memory computing and multi-bit in-memory computing,and can simultaneously support operations of multi-bit quantized neural networks and bi-narized neural networks,and thus supports the hybrid-precision neural networks.With respect to different characteristics of the binarized in-memory computing and multibit in-memory computing,a variety of circuit-level optimization techniques are proposed to reduce circuit noise and save area and power consumption.A dual-mode ADC is pro-posed which has two operating modes of voltage sampling and current sampling and can meet the measurement requirements for binarized in-memory computing and multibit-scheme in-memory computing,respectively.The voltage sampling mode,which pro-vides higher precision and higher power consumption,is suitable for measurement of results of the multibit in-memory computing,while the current sampling mode,which provides lower precision,faster speed and lower power consumption,is suitable for measurement of results of the binarized in-memory computing.The hybrid-precision in-memory computing scheme achieves high accuracy,low power consumption and high energy efficiency because of optimizations in the hybrid-precision neural network,the hybrid-precision architecture,and the circuit design.Based on the above optimizations,an in-memory computing chip based on 65nm NOR flash memory process is designed and taped out.According to experimental results,it can achieve an energy efficiency of 2.15TOPS/W and an accuracy of 93-98%in various neural networks and datasets.(3)To solve the problem that the limitation on digital process makes it impossible for core units to benefit from a more advanced process,a digital-in-memory computing heterogeneous neural network accelerating system is proposed in this dissertation.This system allocates tasks having high-parallelism,low-control,and low-precision require-ments in the network to an in-memory computing module,and allocates the rest tasks to a more flexible digital module.Meanwhile,analog-to-digital converters,which cost most of the power consumption of the system,is implemented in the digital module,thereby greatly improving the performance and energy efficiency.In order to achieve this,an FPGA-based analog interconnection is proposed,which utilizes rich SERDES resources on the FPGA as high-speed ADCs for measuring results of in-memory com-puting to achieve fewer connections and higher performance.According to experimen-tal results of various neural networks and datasets,the performance of the system can be improved by a factor of 26.7 and the energy efficiency ratio can be improved by a factor of 5.8 without loss of accuracy.(4)The irregular memory access of emerging graph convolutional networks com-pared to common deep neural networks makes the cache stragety failed and greatly af-fects memory access efficiency which is prominent for graph convolutional networks.To solve this problem,this paper proposes a resource-optimized history-table-based ap-proximate learning cache stragety.As present graph convolutional networks are mostly static networks,a history table can be establish by learning at runtime,and data with the highest profits can be selected.Through approximation,the cost on scarce on-chip memory resources can be reduced greatly.
Keywords/Search Tags:In Memory Computing, Neuromorphic Computing, High Energy Efficiency, Heterogeneous Computing
PDF Full Text Request
Related items