The Design And FPGA Verification Of End-to-end Mandarin Speech Recognition Based On CNN

Posted on:2022-08-26

Degree:Master

Type:Thesis

Country:China

Candidate:L Wang

Full Text:PDF

GTID:2518306740993539

Subject:IC Engineering

Abstract/Summary:

PDF Full Text Request

As the first interface of human-computer interaction,voice recognition is widely used in fields such as smart speakers,smart homes,and automotive electronics.With its powerful nonlinear expression and feature extraction capabilities,convolutional neural networks have been applied to the study of acoustic models of speech recognition algorithms by a large number of researchers.However,compared with traditional speech recognition algorithms,speech recognition algorithms based on convolutional neural networks have more parameters and calculations,and require higher hardware conditions,making it difficult to deploy in mobile terminals.Therefore,based on the software and hardware co-design,the realization of efficient and fast speech recognition algorithms has important practical significance.In this thesis,an end-to-end speech recognition algorithm is designed based on convolutional neural networks.By introducing the convolutional attention module,the convolutional neural network's ability to model acoustic models is improved.By using a less complex spectrogram,while saving the calculation time for feature extraction,most of the information of the input speech is preserved.By optimizing the network structure and using connection timing classification,the model performance is improved without increasing the amount of model parameters.The use of data enhancement expands the data diversity of small data sets and greatly improves the recognition accuracy of the model.In terms of the convolutional neural network accelerator,the convolution calculation module,the mode controller,the data buffer module,the intermediate buffer module and the result processing module are designed,and the functional simulation of each module is completed.Finally,an FPGA verification system was built and the algorithm was transplanted to verify the effectiveness of the speech recognition algorithm.The speech recognition algorithm designed in this thesis based on convolutional neural network has achieved 82.4% accuracy on the thchs-30 data set.In order to verify the algorithm,this thesis builds a verification system based on the FPGA platform.The experimental results show that at a clock frequency of 100 MHz,the effective computing power of the convolutional neural network accelerator reaches 53.2 GOPS,and the performance-to-power ratio is 9.9 GOPS/W.From the end of voice input to the completion of recognition,the delay time is about 274 ms.The research in this paper has certain reference significance for the realization of high accuracy and low delay speech recognition system in the future.

Keywords/Search Tags:

Speech Recognition, Convolutional Neural Network, Spectrogram, Connectionist Temporal Classification, FPGA Accelerator

PDF Full Text Request

Related items

1	Research On End-to-end Speech Recognition Based On Convolutional Neural Networks
2	Research On Connectionist Temporal Classification In Speech Recognition
3	Research On Speech Emotion Recognition Algorithm Based On Deep Learning
4	The Speech Emotion Recognition Research Based On Speech Spectrogram And Convolutional Neural Network
5	Construction And Experiment Of Acoustic Model Based On CNN
6	Research And Implementation Of Speech Recognition Algorithm Based On Recurrent Neural Network
7	Chineses Speech Recognition System Based On CLDNN Hybrid Model
8	Speech Emotion Recognition Based On Spectrogram And Neural Network
9	Research On Tibetan Speech Recognition Based On Bidirectional Recurrent Neural Network
10	Research On CTC-based And Attention-based End-to-end Speech Recognition