Font Size: a A A

Research On Speech Enhancement Algorithm Based On Deep Learning

Posted on:2024-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:X LiuFull Text:PDF
GTID:2568307157983069Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:
Speech is the main way for human communication and information transmission.The quality and intelligibility of speech are key indicators of good auditory experience,which directly affect the accuracy of information transmission.However,speech signals are often interfered by noise in real environments,which has a negative impact on daily communication,scientific research,and accurate command transmission.Therefore,it has significant practical significance and scientific value to improve the quality and intelligibility of speech through speech enhancement technology.Single-channel speech enhancement has become a hot topic in speech enhancement research due to its low cost,convenient research,and wide application.Traditional speech enhancement methods do not perform well under complex noise conditions,while deep learning-based methods can handle complex speech signals and are suitable for various speech scenarios,which have certain advantages in speech enhancement tasks.Deep learning methods based on convolutional encoder-decoder structure have been widely used in speech enhancement tasks,but most of these methods use speech time-domain signals as network inputs,which cannot fully utilize speech time-frequency information.The convolutional structure extracts features in local windows and cannot obtain context features of speech.In addition,these methods only use a single target(such as speech frequencydomain signals)to calculate the loss function and do not fully utilize the difference information between enhanced speech and clean speech.To address these issues,this paper uses speech magnitude spectrum as the model input and studies the model structure and loss function.The main work is as follows:(1)This paper proposes a speech enhancement algorithm called AU-Net(Attentionbased U-Net)that combines time-frequency attention mechanism and U-Net.The algorithm uses amplitude spectra as input and can fully utilize speech time-frequency information for speech enhancement.By adding time-frequency attention modules between convolutional encoder-decoder structures,the algorithm can leverage the multi-scale fusion advantages of the encoder-decoder structure on speech features,while improving the network’s ability to obtain contextual information through attention mechanisms,thereby allowing the network to obtain richer global speech features.Experiments show that AU-Net achieves better evaluation metric scores than the baseline model.(2)We propose a multi-objective joint loss function for speech enhancement,which is a linear combination of time-domain loss,frequency-domain loss,and PESQ loss.The weights of different losses can be adjusted to control their impact during model training.Multiple comparative experiments show that each loss has its own emphasis in improving evaluation metrics,while the multi-objective joint loss function combines the advantages of the three losses and significantly improves the evaluation scores of AU-Net,outperforming other representative speech enhancement algorithms.The above work demonstrates that time-frequency attention mechanisms and multiobjective joint loss functions can improve the enhancement effect of the model.The proposed algorithm can effectively reduce speech distortion and background noise,improve speech quality and intelligibility,and its enhancement effect is better than that of most advanced models currently available.It also indicates that improving the model structure and optimizing the loss function can be a research direction for improving speech enhancement performance.
Keywords/Search Tags:speech enhancement, deep learning, U-Net, multi-head attention mechanism, multi-objective joint loss function
Related items