Robust Speech Recognition Based On Deep Learning

Posted on:2022-02-13

Degree:Master

Type:Thesis

Country:China

Candidate:S Q Wang

Full Text:PDF

GTID:2518306554968359

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

In order to improve the performance of speech recognition system in noise,reverberation and other environments,this paper takes deep learning and microphone array signal processing as the main technical methods,discusses and studies some key technologies in robust speech recognition.The main work and contribution of this paper are as follows:(1)A speech recognition system based on hybrid acoustic model has been constructed.The influence of triphone,deep neural network and feature transformation on recognition rate is studied through experiments,and the robustness of these techniques in reverberation and noise environment is evaluated;(2)In the practical application of streaming speech recognition,real-time multi-task pre-processing with low latency and strong robustness to noise is highly required.To solve the problem,a multi-task deep learning model of speech enhancement and voice activity detection is proposed.This model constructs a causal system suitable for real-time online processing by introducing a long short-term memory network and connecting the output layers of two tasks.Experimental results show that,processing speed of multi-task model improves considerably to 44.2% compared with the serial processing of baseline models with similar speech enhancement results and better detection results,which is a great significance for the application and deployment of the pre-processing model based on deep learning;(3)In order to improve the performance of multichannel speech separation algorithm in diffuse noise,a spatial covariance model and parameter estimation method for speech separation and noise reduction are proposed.In this method,diffuse noise is assumed to be an independent source,and the spatial characteristics of the target source are modeled by the spatial covariance matrix reconstructed from the steering vector,and the multichannel wiener filter for speech separation is estimated by the spatial covariance analysis method.Moreover,a joint parameter framework of this method and postfilter is proposed,which provides more compromise selections between speech dereverberation and noise reduction for the output signal.In the experiments of speech separation in diffuse noise,the proposed method outperformed the conventional methods.The postfilter with joint parameters provides more satisfactory denoised speech.This verified the effectiveness of the proposed model and parameter estimation method.In addition,in the robust speech recognition experiment,the speech recognition rate of the proposed method is improved in both diffuse noise and point noise,and is better than other methods,which verified the effectiveness of the proposed method as a robust speech recognition front-end processing system.

Keywords/Search Tags:

microphone array signal processing, deep learning, speech enhancement, voice activity detection, speech recognition

PDF Full Text Request

Related items

1	Study On Key Techniques In Speech Enhancement With Microphone Array
2	Research On Speech Enhancement Algorithms Of Microphone Array
3	Research On Speech Signal Preprocessing Based On Deep Learning In Complex Environment
4	Research On Speech Command Word Recognition Based On Dual Microphone Array And Deep Learning
5	Research On Key Technologies Of Embedded Speech Recognition Front End Processing
6	Design Of Speech Recognition System Based On Microphone Array
7	Deep Learning-based Speech Enhancement With Microphone Array
8	Research On Speech Enhancement Algorithm And Realization Based On DSP
9	Research And Implementation On Constructing Speech Collection System Based On Deep Learning
10	Audio Capturing And Enhancement Based On Microphone Arrays