Research And Design Of Lightweight Anti-noise Speech Recognition Algorithm

Posted on:2022-11-23

Degree:Master

Type:Thesis

Country:China

Candidate:X Li

Full Text:PDF

GTID:2518306764467184

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

With the development of speech recognition technology and the changes in people's living needs,there are more and more applications including speech recognition functions.However,in real scenarios,the impact of environmental noise on the speech recognition effect is huge,which makes many products completely lose their application ability.Besides,the carriers of many speech recognition applications are embedded devices,and the limited resources and power consumption of these devices also seriously hinder the deployment of speech recognition models.Some existing speech recognition models still face huge difficulties in these aspects.After researching and comparing many popular speech recognition algorithms,in order to solve the problem of efficient speech recognition in noisy environments,thesis proposes a new two-stage speech recognition model,which converts the speech feature sequence into a phoneme sequence and then into a final text sequence.The first stage acoustic model is responsible for processing the acoustic features of speech signals,which is mainly composed of convolutional network structure and GLU activation function.CTC loss function is used for model training to convert speech features into phonemes.The second stage language model is responsible for processing the linguistic features of phonemes,which is mainly composed of Bi-LSTM network structure and Self-Attention mechanism.The focal loss function is used to train the model and convert phonemes to words in one-to-one correspondence.During the training process,speech feature enhancement technology is used for the input of the acoustic model,and random replacement of error-prone phonemes is performed for the input of the language model,which further reinforce generalization performance of the model.After training with multiple open source speech datasets and a large amount of low-cost text data,on the test set of the Mandarin speech dataset Ai Shell-1,the acoustic model achieves a phoneme error rate of 1.31%,and the language model achieves a 99.42% accuracy in characters.Finally,the speech recognition character error rate(CER)of the complete model with only 15 M parameters is as low as 3.29%,achieving state-of-the-art accuracy.In terms of noise immunity and light weight,the model has two main features.First,the two-stage method limits the impact of noise on speech recognition to the first stage,and separates the evaluation of the impact of noise from the evaluation of the overall speech recognition accuracy,so the model can be optimized for more targeted anti-noise.Second,the two-stage model has more simple and specific goals,can adopt a more streamlined and effective network structure,and use fewer parameters to achieve the ideal expression effect,so the whole model can have a lighter representation.According to these characteristics,thesis adopts a number of techniques,including feature noise reduction algorithm,training fine-tuning,modified loss function,lightweight convolution,network pruning,etc.In the two stages,the model is optimized respectively,and finally a high precision speech recognition model with better anti-noise performance and more lightweight is obtained.On the speech test set with a signal-to-noise ratio in the range of-5db to 10 db and containing multiple types of noise,the complete model with 6M parameters achieves a character error rate(CER)of 5.74% and an average delay time of210 ms.

Keywords/Search Tags:

speech recognition, anti-noise, lightweight model, convolutional neural network, self-attention mechanism

PDF Full Text Request

Related items

1	Research On Gesture Recognition Method Based On Improved Convolutional Neural Network
2	Facial Expression Recognition Based On Attention Fusion Convolutional Neural Network
3	Research On Speech Emotion Recognition Based On Convolutional Recurrent Neural Network
4	Speech Emotion Recognition Based On Deep Learning
5	Research On Face Recognition Method Based On Convolutional Neural Network
6	Research On Multi-person Speech Recognition Based On Deep Learning
7	Research On Speech Emotion Recognition Model Based On Deep Neural Network
8	Design Of Mathematical Formula Recognition System Based On Convolutional Neural Network And Attention Mechanism
9	Research On Speech Emotion Recognition Based On Spectrogram And Statistical Features
10	Research On Speech Enhancement Algorithm Based On Attention Fusion Convolutional Neural Network