Research Of Monaural Speech Enhancement Under Extremely Low Signal-to-noise Ratio

Posted on:2024-04-24

Degree:Master

Type:Thesis

Country:China

Candidate:J J Li

Full Text:PDF

GTID:2558307079960729

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the advent of the modern era of intelligence,the combination of voice and communication equipment has become an important way of human-computer interaction..Speech related tasks have become a research focus in academia and industry.Speech enhancement is designed to remove the noise components of speech signals and improve the quality and intelligibility of speech.It can not only provide high-quality speech for automatic speech recognition as a pre-processing task,but also provide speech services for hearing AIDS and communication devices.Till now,there have been a lot of speech enhancement methods in the academic circles,including digital signal processing based on statistics and neural networks methods,but these algorithms can not work well in the complex extremely low signal-to-noise ratio noise environment.This thesis focuses on the speech enhancement tasks of the noise in the complex environment,analyzes the limitations of existing speech enhancement algorithms in the face of extremely low signal-to-noise ratio noise environment,and summarizes three key scientific problems to be solved,namely:(1)speech feature extraction is limited,speech features are sparse in the extremely low signal-to-noise ratio noise environment.Moreover,speech features usually have time-domain long-sequence dependence and frequency domain cross-band global relationship,which is difficult to be accurately modeled by neural networks.(2)The problem of phase information error.In the extremely low signal-to-noise ratio noise environment,the phase information with noise is used to directly restore the speech waveform signal,which brings distortion to the enhanced speech;(3)Generalization problem of complex noise environment.The changes of noise condition in real environment are very complex,including the changes of noise signal-tonoise ratio,noise type and noise stationarity,resulting in poor generalization performance of the model.In view of the above three problems,this is puts forward corresponding solutions as follows:(1)A multi-scale information aggregation network was constructed to extract the time-frequency domain dependence of speech features.Firstly,a multi-scale perception module combining extended convolution and conventional convolution was proposed to model different local phoneme patterns.Secondly,a non-local module was applied to extract cross-band global features to realize multi-scale information perception and adaptive feature extraction of speech and noise.(2)A dual-path collaborative learning network is constructed to alleviate the phase error problem.The network chooses complex spectrum and amplitude spectrum as the training targets.The amplitude spectrum can better map the structured speech information,and the complex spectrum can optimize the phase information.(3)Design an environment perception adaptive module and a noise template library based on unsupervised contrast learning.Firstly,design an environment perception module to extract features from complex noise environments,including signal-to-noise ratio,voice print and noise types.Secondly,the concept of unsupervised learning and contrast learning is introduced to construct the noise template library and reconstruct the training framework.The noisy speech is compared with the template library noise to obtain the prior noise knowledge.Finally,this is implements a speech enhancement verification system under the environment of limiting signal-to-noise ratio,integrates a variety of speech enhancement and recognition algorithms to compare and verify the performance of different algorithms,and visualizes the speech spectrum to demonstrate the effectiveness of the proposed algorithm.

Keywords/Search Tags:

Speech enhancement, extremely low signal-to-noise ratio, complex noise environments, multi-scale networks, collaborative learning

PDF Full Text Request

Related items

1	Research On Speech Enhancement Technology In Low Signal-to-Noise Ratio Environment
2	Extremely Low Signal-to-noise Ratio Speech Enhancement Method Based On Deep Learning
3	Speech Enhancement Algorithm Based On Deep Learning In Complex Background
4	Single-Channel Speech Enhancement Algorithm Based On Audio Feature Perception
5	Speech Enhancement Technique Research In Low SNR Conditions Based On Short-Time Spectrum Estimation
6	Single Channel Speech Enhancement Based On Generative Adversarial Networks
7	Research On Speech Enhancement And Related Technologies In Low SNR Environments
8	Research On Algorithms Of Speech Enhancement In The Low SNR And Complicated Environments
9	Research On Improved ME-MGCRN Speech Enhancement Algorithm Based On Low Signal-to-noise Ratio Case
10	Research On Speech Enhancement Method Based On DNN And MultiResU＿Net