Research On Speech Separation Method Based On Causal Feature Input And Multi-Scale Feature Fusion

Posted on:2024-03-24

Degree:Master

Type:Thesis

Country:China

Candidate:J H Li

Full Text:PDF

GTID:2568307139958489

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In daily life,due to the existence of interference sources and various kinds of noise,the target speech is seriously polluted,and the PESQ,STOI,SNR and other indicators of the target speech are sharply reduced,which seriously affects the accuracy of the back-end speech recognition,and also brings people a bad hearing experience.Speech separation technology is an important part of speech signal processing algorithm.The task of speech separation is to separate the speech from the overlapping audio that people are interested,remove the interference and noise as much as possible to improve the STOI,SNR and PESQ of the target speech and so on.Speech signals are temporal signals,and recurrent neural networks can effectively model the temporal features of signals,while convolutional neural networks can effectively extract the structural features of spectrum.Therefore,this paper proposes a convolution gated recurrent neural network for causal speech separation by combining convolution neural network and gated recurrent neural network.Aiming at the performance of convolutional neural network in speech separation task is limited due to the fixed size of receptive field,a target speech separation method based on multi-scale feature fusion is proposed.The main contributions are as follows:(1)A convolutional gated recurrent neural network for single-channel causal speech and noise separation is proposed to address the performance decreases of the model when the input of the separation model is causal input.This network combines the advantages of recurrent neural network and convolution neural network in speech separation.By using convolution operation substitute for the matrix product of the full connection in the recurrent neural network.It effectively retains the spectrum structure characteristics of speech,improves the PESQ,SSNR and STOI of the separated speech,and reduces the number of parameters of the model.In addition,the output result of the network unit at the current time is determined by the input of the current time and the input and output of the previous time,which greatly utilizes the characteristic information of the causal input.The design of the network model improves the performance of the model for single-channel speech and noise separation under the condition of causal input.(2)Because the fixed-size receptive field limits its performance in speech separation for convolutional neural network,a multi-channel target speech separation model based on multiscale feature fusion is proposed.This model extracts multi-scale speech signal features and directional features by using group convolution and dilated convolution,while reducing the number of model parameters.This model greatly improves the performance of convolutional neural network in target speech separation.In addition,in order to modeling the temporal characteristics of the speech and improve the quality of target speech,temporal convolutional network(TCN)is used to improve the temporal modeling ability of the model.The separated target speech has greatly improved in PESQ,STOI,SI-SDR and other indicators.Through the test on the open datasets and the datasets generated by the open datasets,this paper proves that the proposed method has greatly improved the STOI,PESQ,SSNR and other indicators compared with the traditional network structure.In addition,the number of parameters has also been optimized and reducing parameters of the speech separation model.

Keywords/Search Tags:

Speech separation, Convolution neural network, Speech quality, Speech intelligibility

PDF Full Text Request

Related items

1	Study On Objective Speech Quality Assessment For Speech Communication
2	Research On Supervised Speech Separation Based On Deep Learning
3	Speech Enhancement Method Improving Speech Intelligibility Effectively
4	Study On Speech Intelligibility Enhancement In Low Signal-to-Noise Ratio Environment
5	The Research On Objective Measures Of Speech Intelligibility
6	Single Channel Speech Separation Methods Based On Deep Neural Network
7	Speech Enhancement Based On Deep Neural Network And Recurrent Neural Network
8	The Study Of Dual-channel Speech Separation Technology For Smart Mobile Devices
9	The Research Of Key Techniques Of Speech Separation And Speech Recognition
10	Research And Implementation Of Speech Intelligibility Evaluation Method Based On Phoneme