| With the development of artificial intelligence,speech recognition has become an important search field of artificial intelligence applications,which is widely used in smart home,smart car,and social chat.However,the implementation of the speech recognition should be real-time,low power consumption,and low cost to be applied in the terminal field.In this paper,an end-to-end speech recognition hardware acceleration system is proposed to meet the urgent needs in the terminal field.At the algorithm level,through the fixed-point quantization scheme and the Top-k pruning technology,the amount of parameters and computations of neural network model are reduced,which moderates the on-chip memory consumption and provides the possibility for hardware acceleration.At the hardware level,firstly,reconfigurable computing modules supporting sparse matrix operations and dense matrix operation are proposed,secondly,non-linear operations,such as sigmoid function,tanh function and softmax function are also implemented in an efficient way.Finally,the corresponding dataflow is developed to reuse the weight matrix in time dimension,which alleviates the stress of limited off-chip DRAM bandwidth and reduces the power consumption generated by external memory accesses.At the software level,the flexibility of the accelerator is improved by configuring the registers of the corresponding address of the accelerator.The experimental results show that,compared with previous CPU-based,GPU-based,and FPGA-based implementations,the processing speed of the proposed accelerator is improved,the delay of our accelerator is reduced by 27 times,4 times and 4 times,respectively,and the energy efficiency is increased by 471 times,182 times and 2 times,respectively.Therefore,our accelerator is more suitable for the deployment in terminal devices. |