| The prevalence and popularization of chronic diseases such as cardiovascular diseases in a deeply aging society have led to a serious shortage of high-quality medical resources.Building an intelligent medical service system with wearable devices supported by artificial intelligence algorithms as the entry point can effectively promote the sinking of high-quality medical resources.Realizing automatic classification and diagnosis of electrocardiogram signals at the grassroots level is of great significance for providing real-time and accurate electrocardiogram detection services,which is conducive to real-time detection and protection of sudden cardiovascular diseases.Therefore,it has received widespread attention from both domestic and international academia and industry.Aiming at the electrocardiogram classification application scenario on wearable device,this thesis takes low resource consumption and high hardware efficiency as the core design target.Starting from the theory and model optimization of convolutional neural network algorithm,an efficient convolutional neural network processor architecture is proposed based on the optimization design of key circuits such as the processing element and array structure,and the ASIC design and chip testing of unstructured sparse one-dimensional convolutional neural network are implemented.The main research content of this thesis is as follows:Firstly,a lightweight one-dimensional convolutional neural network network model is designed based on the one-dimensional characteristics of electrocardiogram signals.By using a global average pooling layer to average and reduce the dimensionality of the feature map data calculated by the convolutional layer,compared to the traditional flatten operation,the overall parameter quantity of the fully connected layer is reduced by about 16.8 times,which effectively alleviates the problem of mismatched computing densities between the convolutional layer and the fully connected layer.Based on the internationally published MIT-BIH database,the network is trained,validated,and tested,and the overall classification accuracy of five common heart beat types has reached 99.13%.Subsequently,the network is further optimized and designed using unstructured sparse pruning technology.The network is pruned and trained by setting 9 different target sparsity ranging from 0%~90%.After statistical analysis of test performance results,the optimal target sparsity of the network is explored to be 70%.While reducing the overall parameter count of the model by about 3.2 times,a 99.00%accuracy of heart beat classification is achieved.Secondly,for the 4-layer nested multiplication-addition loop structure of one-dimensional convolution operation,the hardware implementation circuit structure of each layer of loop unrolling is analyzed in detail.On the basis of comprehensive consideration of resource overhead,storage bandwidth,computing latency and data access time,the hardware implementation is carried out using the strategy of parallel expansion of the 1st,2nd,and 4th layers of loops.By designing a dynamic activation circuit based on signed bit within processing element,precise elimination of redundant multiplication calculations for zero value feature map data is achieved,effectively reducing power consumption and hardware resource costs.A cascaded processing element structure is designed based on processing element as the core,and a processing element array is constructed to achieve parallel computation of layers1,2,and 4 loops in time and space.An efficient computing data stream for the pipeline computing mode of processing element array is designed,and a high hardware efficiency one-dimensional convolutional neural network processor system architecture is proposed.In addition,by analyzing the computational principle of the global average pooling process,an algorithm optimization technique that is conducive to hardware implementation is proposed.Without affecting the calculation delay and accuracy,the division operation is replaced by shift operation,reducing the complexity of hardware implementation.The hardware implementation and performance testing of the one-dimensional convolutional neural network processor are completed on the FPGA development platform.The processor can achieve a classification accuracy of 99.10%for five types of heartbeat,and the hardware implementation only consumes 1538 lookup tables and 2796 Flip Flops.Under a 200 MHz clock,the classification calculation of a single heartbeat data can be completed within 40μs,enabling real-time classification of electrocardiogram signals.The throughput performance reaches25.7 GOP/s,and a 16.71 GOP/s/k LUT hardware efficiency of the lookup table is realized.Finally,for the unstructured sparse pruned one-dimensional convolutional neural network model,an efficient compression storage format is designed for randomly distributed sparse weights.The method uses blocking and tiling to form non-zero weight tiles,achieving the elimination of zero data in convolutional kernels and weight matrices.At 70%optimal target sparsity,the storage capacity required for weight parameters is reduced by about 40%.Based on a thorough analysis of the two sparse optimization strategies of zero skip operation and dynamic gating,the two strategies are applied to fixed weight parameters and dynamically changing feature map data respectively.A tile first data flow is proposed,which can perform effective multiplication and addition calculations of convolutional layers in sparse one-dimensional convolutional neural network networks in pipeline.An index matching module based on two-stage shift registers is also proposed,while reducing the cost of additional hardware resources,the weight index in each weight block is used to efficiently match non-zero weight parameters and feature map data.By optimizing the processing element circuit and array structure,the efficient implementation of convolution calculation and matrix multiplication in compressed storage format are supported.Finally,an unstructured sparse convolutional neural network chip is implemented which is compatible with different network model parameters,the chip flexibility is improved by designing a 32-bit instruction format.The chip is fabricated and tested using the SMIC 40 nm process,with a core area of 2.044mm~2(1.4 mm×1.46 mm),achieving a classification accuracy of 98.99%for five types of heart beats.At the optimal operating frequency of 2 MHz,the energy consumption of the chip to complete a single heart beat classification is 3.666μJ,with an inference delay of 4.683 ms.The key technologies proposed in this thesis have positive research significance and application potential for real-time electrocardiogram classification and diagnosis in wearable devices,laying a theoretical and practical foundation for the development of lightweight AI algorithms and processors in the field of electrocardiogram classification. |