Font Size: a A A

Research On Monaural Speech Separation Technology Based On Deep Learning Joint Optimization And Feature Fusion

Posted on:2022-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:C WangFull Text:PDF
GTID:2518306557970459Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Monaural speech separating is a problem of recovering multiple target speech from the mixed speech signal,which is a branch of blind source separation.Monaural speech separation has the advantages of low cost and easy implementation,and is widely applied in many fields.However,this problem is an under-determined problem,and it is difficult to sort out.It has been regarded as a hot research object in the field of speech signal processing.With the development of deep learning and deep neural networks,the method of solving monaural speech separation has taken a new leap.Compared with traditional separation methods,deep neural networks have achieved great advantages in the field of monaural speech separation by virtue of its excellent relational building capabilities.In this paper,the in-depth research and mining on the monaural speech separation problem based on Deep Neural Network(DNN)and Convolutional Neural Network(CNN)are conducted,and a series of improvements and innovations have been made.The focus is on innovations in the loss function of the separation model,gradient descent algorithm and fusion features.The monaural speech separation joint constraint algorithm based on integrated optimizer DNN and the monaural speech separation algorithm based on CNN feature fusion are proposed.Supported by a series of experiments and theories,the proposed algorithm works well.The contributions of this paper are summed up as follows:(1)The historical background,current situation and research significance of the speech separation problem are analyzed.And then,common solutions to speech separation problems,and description of the classification and basic theory in deep neural networks are introduced.Several commonly used input features and prediction targets for the separation of speech problems are narrated in detail.On this basis,the solution for speech separation based on deep neural networks and the measurement standards of speech quality after separation are explained.(2)To improve the performance of the algorithm based on the traditional DNN-based monaural speech separation,a monaural speech separation with joint constraint algorithm based on integrated optimizer DNN is proposed.The loss function of the traditional algorithm only considers the error between the predicted value and the true value,which makes the error between the separated speech and the pure speech larger.This paper proposes a new joint constraint loss function,which not only constrains the error between the predicted value and the true value,but also penalizes the error between the predicted value and the target speech amplitude spectrum.Under the joint constraint of the loss function,the separation model can be trained better,and the amplitude spectrum of the predicted speech can be closer to the amplitude spectrum of the pure speech.In addition,the traditional gradient descent algorithm has some shortcomings,such as slow convergence,convergence to the local optimum,and excessive learning rate variance.In response to these problems,this paper integrates the two optimizers into the separation model,which not only improves the convergence speed of the loss function,but also improves the accuracy of the separation network.Finally,experiments verify the effectiveness of the proposed algorithm.(3)Aiming at the problem that the monaural speech separation algorithm based on DNN and CNN has limited separation effect when a single feature is input,a monaural speech separation with joint constraint algorithm based on CNN feature fusion is proposed.CNN can retain the spatial information of features,and it is better than traditional DNN in terms of extracting the depth features of speech for separation,but some feature information will still be lost when it is applied to monaural speech separation.This paper proposes a CNN structure with a feature fusion layer,which adds a feature fusion layer on the basis of traditional CNN,uses CNN to extract the depth features of multichannel input features,and fuse the deep features with acoustic features in the fusion layer.The fusion features continue to be used to train the separation model.Since the deep feature and the original acoustic feature are fused to form a separation feature,this feature contains abundant speech information and has a strong speech signal characterization ability,which makes the masks predicted by the separation model more accurate.Experimental results show that the proposed algorithm exhibits better separation performance than traditional algorithms.
Keywords/Search Tags:Speech Signal Processing, Monaural Speech Separation, Gradient Descent Algorithm, Loss Function, Feature Fusion, Convolutional Neural Network, Deep Neural Network
PDF Full Text Request
Related items