Font Size: a A A

The Design And FPGA Verification Of End-to-end Mandarin Speech Recognition Based On CNN

Posted on:2022-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2518306740993539Subject:IC Engineering
Abstract/Summary:PDF Full Text Request
As the first interface of human-computer interaction,voice recognition is widely used in fields such as smart speakers,smart homes,and automotive electronics.With its powerful nonlinear expression and feature extraction capabilities,convolutional neural networks have been applied to the study of acoustic models of speech recognition algorithms by a large number of researchers.However,compared with traditional speech recognition algorithms,speech recognition algorithms based on convolutional neural networks have more parameters and calculations,and require higher hardware conditions,making it difficult to deploy in mobile terminals.Therefore,based on the software and hardware co-design,the realization of efficient and fast speech recognition algorithms has important practical significance.In this thesis,an end-to-end speech recognition algorithm is designed based on convolutional neural networks.By introducing the convolutional attention module,the convolutional neural network's ability to model acoustic models is improved.By using a less complex spectrogram,while saving the calculation time for feature extraction,most of the information of the input speech is preserved.By optimizing the network structure and using connection timing classification,the model performance is improved without increasing the amount of model parameters.The use of data enhancement expands the data diversity of small data sets and greatly improves the recognition accuracy of the model.In terms of the convolutional neural network accelerator,the convolution calculation module,the mode controller,the data buffer module,the intermediate buffer module and the result processing module are designed,and the functional simulation of each module is completed.Finally,an FPGA verification system was built and the algorithm was transplanted to verify the effectiveness of the speech recognition algorithm.The speech recognition algorithm designed in this thesis based on convolutional neural network has achieved 82.4% accuracy on the thchs-30 data set.In order to verify the algorithm,this thesis builds a verification system based on the FPGA platform.The experimental results show that at a clock frequency of 100 MHz,the effective computing power of the convolutional neural network accelerator reaches 53.2 GOPS,and the performance-to-power ratio is 9.9 GOPS/W.From the end of voice input to the completion of recognition,the delay time is about 274 ms.The research in this paper has certain reference significance for the realization of high accuracy and low delay speech recognition system in the future.
Keywords/Search Tags:Speech Recognition, Convolutional Neural Network, Spectrogram, Connectionist Temporal Classification, FPGA Accelerator
PDF Full Text Request
Related items