Font Size: a A A

Research Of Monaural Speech Enhancement Under Extremely Low Signal-to-noise Ratio

Posted on:2024-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:J J LiFull Text:PDF
GTID:2558307079960729Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of the modern era of intelligence,the combination of voice and communication equipment has become an important way of human-computer interaction..Speech related tasks have become a research focus in academia and industry.Speech enhancement is designed to remove the noise components of speech signals and improve the quality and intelligibility of speech.It can not only provide high-quality speech for automatic speech recognition as a pre-processing task,but also provide speech services for hearing AIDS and communication devices.Till now,there have been a lot of speech enhancement methods in the academic circles,including digital signal processing based on statistics and neural networks methods,but these algorithms can not work well in the complex extremely low signal-to-noise ratio noise environment.This thesis focuses on the speech enhancement tasks of the noise in the complex environment,analyzes the limitations of existing speech enhancement algorithms in the face of extremely low signal-to-noise ratio noise environment,and summarizes three key scientific problems to be solved,namely:(1)speech feature extraction is limited,speech features are sparse in the extremely low signal-to-noise ratio noise environment.Moreover,speech features usually have time-domain long-sequence dependence and frequency domain cross-band global relationship,which is difficult to be accurately modeled by neural networks.(2)The problem of phase information error.In the extremely low signal-to-noise ratio noise environment,the phase information with noise is used to directly restore the speech waveform signal,which brings distortion to the enhanced speech;(3)Generalization problem of complex noise environment.The changes of noise condition in real environment are very complex,including the changes of noise signal-tonoise ratio,noise type and noise stationarity,resulting in poor generalization performance of the model.In view of the above three problems,this is puts forward corresponding solutions as follows:(1)A multi-scale information aggregation network was constructed to extract the time-frequency domain dependence of speech features.Firstly,a multi-scale perception module combining extended convolution and conventional convolution was proposed to model different local phoneme patterns.Secondly,a non-local module was applied to extract cross-band global features to realize multi-scale information perception and adaptive feature extraction of speech and noise.(2)A dual-path collaborative learning network is constructed to alleviate the phase error problem.The network chooses complex spectrum and amplitude spectrum as the training targets.The amplitude spectrum can better map the structured speech information,and the complex spectrum can optimize the phase information.(3)Design an environment perception adaptive module and a noise template library based on unsupervised contrast learning.Firstly,design an environment perception module to extract features from complex noise environments,including signal-to-noise ratio,voice print and noise types.Secondly,the concept of unsupervised learning and contrast learning is introduced to construct the noise template library and reconstruct the training framework.The noisy speech is compared with the template library noise to obtain the prior noise knowledge.Finally,this is implements a speech enhancement verification system under the environment of limiting signal-to-noise ratio,integrates a variety of speech enhancement and recognition algorithms to compare and verify the performance of different algorithms,and visualizes the speech spectrum to demonstrate the effectiveness of the proposed algorithm.
Keywords/Search Tags:Speech enhancement, extremely low signal-to-noise ratio, complex noise environments, multi-scale networks, collaborative learning
PDF Full Text Request
Related items