Font Size: a A A

Single-Channel Speech Enhancement Algorithm Based On Convolutional Recurrent Neural Network

Posted on:2023-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:S B WeiFull Text:PDF
GTID:2568306914481454Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Speech enhancement technology aims to extract pure target speech by processing noisy speech.At the same time,as far as possible to ensure the integrity and intelligibility of the target speech.Traditional speech enhancement algorithms based on digital signal processing are generally established under certain prior conditions.Once the prior conditions cannot be satisfied,the processing effect will be greatly reduced.The single-channel speech enhancement method has less speech information available and cannot utilize spatial information like the multi-channel method,so it is more challenging.To obtain better speech performance,data-driven deep learning methods have been introduced into the field of single-channel speech enhancement in recent years.The neural network model trained by a large amount of data can filter the noise under the condition of non-stationary noise and obtain pure target speech.In this paper,the single-channel speech enhancement algorithm is studied based on the convolutional recurrent neural network model(CRN).The main innovations are as follows:1)To solve the problem that the sampling convolution layer at different scales in the original CRN can only conduct local information modeling,but cannot effectively integrate the global information of speech,a multi-scale convolution recurrent neural network model(MS-CRN)is proposed.By using Bi-LSTM to model the outputs of convolution layers at different scales,the model can better learn the global speech information.The experimental results show that the SI-SNR of MS-CRN is about 0.4dB higher than that of the original CRN at the condition of OdB SNR.2)To help the model learn the features of different dimensions of a speech sequence and improve the "attention" of valid speech units,in this paper,a multi-path convolution recurrent network model(MP_ATT_CRN)integrating attention mechanism is proposed.In this model,Bi-LSTM layer is first used to model the sequence features sampled under multi-layer convolution in CRN along the time dimension and frequency dimension respectively to enrich the learning content of the model.Then the sequence features learned from the model are input into the attention module to improve the weight of effective speech units in the training process and improve the performance of the model.Experimental results show that the SI-SNR score of MP_ATT_CRN is about 1.0dB and 0.6dB higher than that of the original CRN and MS-CRN models,respectively,under the condition of OdB SNR.
Keywords/Search Tags:speech enhancement, multi-scale, CRN, multi-path, attention mechanism
PDF Full Text Request
Related items