Font Size: a A A

Robust Speech Recognition Based On Deep Learning

Posted on:2022-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:S Q WangFull Text:PDF
GTID:2518306554968359Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In order to improve the performance of speech recognition system in noise,reverberation and other environments,this paper takes deep learning and microphone array signal processing as the main technical methods,discusses and studies some key technologies in robust speech recognition.The main work and contribution of this paper are as follows:(1)A speech recognition system based on hybrid acoustic model has been constructed.The influence of triphone,deep neural network and feature transformation on recognition rate is studied through experiments,and the robustness of these techniques in reverberation and noise environment is evaluated;(2)In the practical application of streaming speech recognition,real-time multi-task pre-processing with low latency and strong robustness to noise is highly required.To solve the problem,a multi-task deep learning model of speech enhancement and voice activity detection is proposed.This model constructs a causal system suitable for real-time online processing by introducing a long short-term memory network and connecting the output layers of two tasks.Experimental results show that,processing speed of multi-task model improves considerably to 44.2% compared with the serial processing of baseline models with similar speech enhancement results and better detection results,which is a great significance for the application and deployment of the pre-processing model based on deep learning;(3)In order to improve the performance of multichannel speech separation algorithm in diffuse noise,a spatial covariance model and parameter estimation method for speech separation and noise reduction are proposed.In this method,diffuse noise is assumed to be an independent source,and the spatial characteristics of the target source are modeled by the spatial covariance matrix reconstructed from the steering vector,and the multichannel wiener filter for speech separation is estimated by the spatial covariance analysis method.Moreover,a joint parameter framework of this method and postfilter is proposed,which provides more compromise selections between speech dereverberation and noise reduction for the output signal.In the experiments of speech separation in diffuse noise,the proposed method outperformed the conventional methods.The postfilter with joint parameters provides more satisfactory denoised speech.This verified the effectiveness of the proposed model and parameter estimation method.In addition,in the robust speech recognition experiment,the speech recognition rate of the proposed method is improved in both diffuse noise and point noise,and is better than other methods,which verified the effectiveness of the proposed method as a robust speech recognition front-end processing system.
Keywords/Search Tags:microphone array signal processing, deep learning, speech enhancement, voice activity detection, speech recognition
PDF Full Text Request
Related items