Font Size: a A A

Research On Speech Enhancement Technology Based On Deep Learning

Posted on:2024-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:F J LiFull Text:PDF
GTID:2568306935984719Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of modern technology,speech interaction is gradually being widely used between humans and machines.However,various types of noise interference are prevalent in actual speech communication scenarios.These noise disturbances may degrade the quality of speech communication and the performance of speech-related systems.Therefore,it is particularly important to study speech enhancement techniques to improve the reliability and quality of speech communication.Speech enhancement models based on deep learning have achieved good enhancement results,but there are still some shortcomings.The mainstream speech enhancement framework of complex convolutional recurrent networks has a fixed size of the convolutional kernel of the encoder-decoder,while the network does not fully utilize the structural information of the spectral features in the decoding process,and the network has limited ability to extract and restore features.In addition,the recurrent network between the encoder and decoder cannot capture long-range temporal correlation and global temporal context,while the generalization performance of the speech enhancement network in the face of mismatched noise and unknown signal-to-noise ratio still needs further improvement.To address the above issues,this paper conducts research on deep learning-based speech enhancement techniques,with the following main work:(1)A speech enhancement method based on complex selective kernel and hybrid attention is proposed.The method designs a complex selective kernel network structure to process the complex domain spectral features of speech,employs two sizes of convolutional kernels in the encoder-decoder,and extracts multi-scale local features through two branches to enhance the feature modeling capability of the encoder-decoder.Meanwhile,a hybrid attention module based on channel attention mechanism and spatial attention mechanism is introduced to suppress the redundant information fed into the decoder features by jumping connections and help the decoder focus on the channel location and spatial content that are useful for training target estimation.This method can enhance the ability of the complex encoder-decoder to extract and restore features and promote the representation capability of the complex convolutional layer.The ablation experiments and performance evaluation experiments conducted on THCHS-30 and VBD datasets show that the proposed model has better speech enhancement performance compared to the baseline DCCRN and other mainstream complex speech enhancement networks.(2)By replacing the recurrent neural network between the encoder and decoder with a temporal convolutional network and introducing a parallel temporal attention module,the long sequence modeling capability of the network and the ability to model the global temporal context can be effectively improved.In addition,based on the idea of multi-task learning,a parallel signal-to-noise ratio prediction auxiliary task is added to the original network to allow the speech enhancement network to learn additional signal-to-noise ratio-related features,and this method can improve the self-adaptive ability and generalization ability of the speech enhancement network for unknown noise types and unknown signal-to-noise ratios.Experiments related to the further improved obtained multi-task learning speech enhancement network on THCHS-30 and Voice Bank datasets show that the proposed method has better generalization performance.
Keywords/Search Tags:Speech Enhancement, Selective Kernel Networks, Attention Mechanism, Temporal Convolutional Networks, Multi-task Learning
PDF Full Text Request
Related items