Font Size: a A A

Design And Implementation Of Robust Speech Recognition System Based On Deep Neural Network

Posted on:2022-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ZhangFull Text:PDF
GTID:2518306482973279Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
As the first step of human-computer interaction,speech recognition has very important practical significance.In real life,human speech is often interfered by noise,reverberation and speaker,which makes the performance of speech recognition system poor.Therefore,in order to solve the above problems and improve the recognition accuracy of the speech recognition system for noisy speech,the models based on deep neural network are proposed respectively for isolated words and continuous speech for robust speech recognition.The contributions of this study are as follows:(1)Aiming at the robust speech recognition of isolated words,a transfer autoencoder bi-recurrent neural network model is proposed.In order to extract the similarity features of noisy speech and clean speech,firstly,the deep auto-encoder is trained to map them directly;then,the parameters and structure of the auto-encoder are transferred to the acoustic model,which is used as the feature extractor of the whole model,and the acoustic model is used as the classifier of the whole model;finally,it is applied to isolated word speech recognition.The experimental results show that the average recognition accuracy of the model is 71.92% in the robustness level and 53.96%in the generalization level.(2)Aiming at the robust speech recognition of continuous speech,a network model based on the combination of front end and back end is proposed.The front end uses the traditional speech enhancement algorithm for speech noise reduction;the back end uses the improved network model based on CLSTM,and on this basis,adds the gate unit GTU or GLU,residual network and dilated convolution to get GRDLSTM for speech recognition.In order to improve the robustness of the model,the multi-condition training datasets is self-made.The specific training process is as follows: firstly,the speech in the multi-condition datasets is enhanced through the front-end module;secondly,MFCC and GFCC features are extracted and fused to obtain MGCC features with high accuracy and certain noise resistance;finally,the above features are used for the model training and comparison.The experimental results show that: under the condition of clean speech training,the noisy speech passes through the front end,although the PESQ and STOI of the enhanced speech are improved,the recognition accuracy is very poor;after using the multi-condition training datasets,the average recognition accuracy of the model is 75.15% under different SNR.(3)In order to show the effect more intuitively,a speech recognition system is designed and implemented based on the above two models.The system has four modules: isolated word speech recognition module,Chinese continuous speech recognition module,spectrum representation module and speech noise adding module.Among them,the isolated word speech recognition module and the Chinese continuous speech recognition module uses the network model proposed to this study at the bottom;the spectrogram representation module displays the time domain,frequency domain and spectrogram of the speech on the system by uploading the speech;the speech noise adding module can generate the noisy speech corresponding to the uploaded speech by selecting the corresponding noise and signal-to-noise ratio.The above four modules are used together to facilitate the research of robust speech recognition.
Keywords/Search Tags:Robust speech recognition, Speech enhancement algorithm, Feature extraction, Deep neural network
PDF Full Text Request
Related items