Font Size: a A A

Research On Speech Enhancement Under Non-Stationary Noise

Posted on:2023-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z S ChenFull Text:PDF
GTID:2558306914963799Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of communication technology,people have higher and higher requirements for voice quality in the communication process.Considering the objective fact that the background noises change all the time,it is of great significance to study speech enhancement in nonstationary noise scenarios for solving practical problems.Since traditional speech enhancement methods are based on the assumption that noise is stationary,their ability to solve practical problems is limited.In recent years,speech enhancement is mainly implemented by deep learning methods.The existing speech enhancement algorithms have the following problems:(1)The loss function is too ideal and lacks consideration of the characteristics of the human ear perception of sound;(2)In the case of low signal-to-noise ratio in the real environment,due to the lack of timefrequency domain information fusion,speech enhancement is not good.In response to the above problems,this paper proposes two researches to improve the performance of speech enhancement in non-stationary noise scenarios,and designs and implements a real-time conference system with ultra-clear sound quality.Firstly,propose speech enhancement based on perceptual loss function.The loss functions of existing speech enhancement algorithms are mostly ideal,resulting in poor final speech enhancement results in non-stationary noise scenarios.This paper fully considers the characteristics of the human ear perception of sound,introduces the perceptual loss function to extract the characteristics of the speech signal,and compares the deviation between the enhanced signal and the original pure signal in a more detailed and comprehensive manner,which can make this algorithm show good results in non-stationary noise scenes.Secondly,propose adaptive speech enhancement based on SNR supervision.In a low signal-to-noise ratio environment,the enhanced speech signal is not good because the speech information used is too single.This paper proposes to introduce the frequency-domain loss function to train the time-domain network,fully integrate the speech information in the time-frequency domain,analyze the advantages and disadvantages of the time-frequency domain loss,and implement the loss adaptive strategy according to the attention mechanism.Thus,the generalization ability of the model in complex environment can be improved.Thirdly,design and implementation of a real-time conference system with ultra-clear sound quality.The speech enhancement algorithms of existing conference systems have limited capabilities,resulting in poor speech quality during communication.This paper designs and implements a client that can run smoothly on Mac and Windows platforms,and applies the speech enhancement algorithm to improve voice quality during communication.
Keywords/Search Tags:speech enhancement, perceptual loss function, attention mechanism, deep learning
PDF Full Text Request
Related items