Single-Channel Speech Enhancement Algorithm Based On Convolutional Recurrent Neural Network

Posted on:2023-05-02

Degree:Master

Type:Thesis

Country:China

Candidate:S B Wei

Full Text:PDF

GTID:2568306914481454

Subject:Information and Communication Engineering

Abstract/Summary:

Speech enhancement technology aims to extract pure target speech by processing noisy speech.At the same time,as far as possible to ensure the integrity and intelligibility of the target speech.Traditional speech enhancement algorithms based on digital signal processing are generally established under certain prior conditions.Once the prior conditions cannot be satisfied,the processing effect will be greatly reduced.The single-channel speech enhancement method has less speech information available and cannot utilize spatial information like the multi-channel method,so it is more challenging.To obtain better speech performance,data-driven deep learning methods have been introduced into the field of single-channel speech enhancement in recent years.The neural network model trained by a large amount of data can filter the noise under the condition of non-stationary noise and obtain pure target speech.In this paper,the single-channel speech enhancement algorithm is studied based on the convolutional recurrent neural network model(CRN).The main innovations are as follows:1)To solve the problem that the sampling convolution layer at different scales in the original CRN can only conduct local information modeling,but cannot effectively integrate the global information of speech,a multi-scale convolution recurrent neural network model(MS-CRN)is proposed.By using Bi-LSTM to model the outputs of convolution layers at different scales,the model can better learn the global speech information.The experimental results show that the SI-SNR of MS-CRN is about 0.4dB higher than that of the original CRN at the condition of OdB SNR.2)To help the model learn the features of different dimensions of a speech sequence and improve the "attention" of valid speech units,in this paper,a multi-path convolution recurrent network model(MP_ATT_CRN)integrating attention mechanism is proposed.In this model,Bi-LSTM layer is first used to model the sequence features sampled under multi-layer convolution in CRN along the time dimension and frequency dimension respectively to enrich the learning content of the model.Then the sequence features learned from the model are input into the attention module to improve the weight of effective speech units in the training process and improve the performance of the model.Experimental results show that the SI-SNR score of MP_ATT_CRN is about 1.0dB and 0.6dB higher than that of the original CRN and MS-CRN models,respectively,under the condition of OdB SNR.

Keywords/Search Tags:

speech enhancement, multi-scale, CRN, multi-path, attention mechanism

Related items

1	Research On Speech Enhancement Method Of Gated Network Based On Multi-head Attention Mechanism
2	Research On Speech Enhancement Algorithm Based On Multi-head Attention Mechanism
3	Research On Single Channel Speech Enhancement Based On Multi-head Attention Mechanism
4	Research On Low Light Image Enhancement Network Based On Multi-Scale Feature Extraction
5	Research On Crowd Counting Method Based On Multi-Scale Attention Mechanism
6	Research On Long Time Sequence Speech Enhancement Based On Multi-task Learning
7	Research On Speech Enhancement Algorithm Based On Deep Learning
8	Multi-modal Speech Emotion Recognition Based On The Attention Mechanism
9	Speech Enhancement Based On Perceptual Multilevel Mixed Attention Skip Connection
10	Research On Panoramic Segmentation Network Based On Feature Enhancement