Font Size: a A A

Research And Design Of Lightweight Anti-noise Speech Recognition Algorithm

Posted on:2022-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2518306764467184Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the development of speech recognition technology and the changes in people's living needs,there are more and more applications including speech recognition functions.However,in real scenarios,the impact of environmental noise on the speech recognition effect is huge,which makes many products completely lose their application ability.Besides,the carriers of many speech recognition applications are embedded devices,and the limited resources and power consumption of these devices also seriously hinder the deployment of speech recognition models.Some existing speech recognition models still face huge difficulties in these aspects.After researching and comparing many popular speech recognition algorithms,in order to solve the problem of efficient speech recognition in noisy environments,thesis proposes a new two-stage speech recognition model,which converts the speech feature sequence into a phoneme sequence and then into a final text sequence.The first stage acoustic model is responsible for processing the acoustic features of speech signals,which is mainly composed of convolutional network structure and GLU activation function.CTC loss function is used for model training to convert speech features into phonemes.The second stage language model is responsible for processing the linguistic features of phonemes,which is mainly composed of Bi-LSTM network structure and Self-Attention mechanism.The focal loss function is used to train the model and convert phonemes to words in one-to-one correspondence.During the training process,speech feature enhancement technology is used for the input of the acoustic model,and random replacement of error-prone phonemes is performed for the input of the language model,which further reinforce generalization performance of the model.After training with multiple open source speech datasets and a large amount of low-cost text data,on the test set of the Mandarin speech dataset Ai Shell-1,the acoustic model achieves a phoneme error rate of 1.31%,and the language model achieves a 99.42% accuracy in characters.Finally,the speech recognition character error rate(CER)of the complete model with only 15 M parameters is as low as 3.29%,achieving state-of-the-art accuracy.In terms of noise immunity and light weight,the model has two main features.First,the two-stage method limits the impact of noise on speech recognition to the first stage,and separates the evaluation of the impact of noise from the evaluation of the overall speech recognition accuracy,so the model can be optimized for more targeted anti-noise.Second,the two-stage model has more simple and specific goals,can adopt a more streamlined and effective network structure,and use fewer parameters to achieve the ideal expression effect,so the whole model can have a lighter representation.According to these characteristics,thesis adopts a number of techniques,including feature noise reduction algorithm,training fine-tuning,modified loss function,lightweight convolution,network pruning,etc.In the two stages,the model is optimized respectively,and finally a high precision speech recognition model with better anti-noise performance and more lightweight is obtained.On the speech test set with a signal-to-noise ratio in the range of-5db to 10 db and containing multiple types of noise,the complete model with 6M parameters achieves a character error rate(CER)of 5.74% and an average delay time of210 ms.
Keywords/Search Tags:speech recognition, anti-noise, lightweight model, convolutional neural network, self-attention mechanism
PDF Full Text Request
Related items