Research On Speech Enhancement Algorithm Based On Deep Learning

Posted on:2024-08-12

Degree:Master

Type:Thesis

Country:China

Candidate:X Liu

Full Text:PDF

GTID:2568307157983069

Subject:Master of Electronic Information (Professional Degree)

Abstract/Summary:

Speech is the main way for human communication and information transmission.The quality and intelligibility of speech are key indicators of good auditory experience,which directly affect the accuracy of information transmission.However,speech signals are often interfered by noise in real environments,which has a negative impact on daily communication,scientific research,and accurate command transmission.Therefore,it has significant practical significance and scientific value to improve the quality and intelligibility of speech through speech enhancement technology.Single-channel speech enhancement has become a hot topic in speech enhancement research due to its low cost,convenient research,and wide application.Traditional speech enhancement methods do not perform well under complex noise conditions,while deep learning-based methods can handle complex speech signals and are suitable for various speech scenarios,which have certain advantages in speech enhancement tasks.Deep learning methods based on convolutional encoder-decoder structure have been widely used in speech enhancement tasks,but most of these methods use speech time-domain signals as network inputs,which cannot fully utilize speech time-frequency information.The convolutional structure extracts features in local windows and cannot obtain context features of speech.In addition,these methods only use a single target(such as speech frequencydomain signals)to calculate the loss function and do not fully utilize the difference information between enhanced speech and clean speech.To address these issues,this paper uses speech magnitude spectrum as the model input and studies the model structure and loss function.The main work is as follows:(1)This paper proposes a speech enhancement algorithm called AU-Net(Attentionbased U-Net)that combines time-frequency attention mechanism and U-Net.The algorithm uses amplitude spectra as input and can fully utilize speech time-frequency information for speech enhancement.By adding time-frequency attention modules between convolutional encoder-decoder structures,the algorithm can leverage the multi-scale fusion advantages of the encoder-decoder structure on speech features,while improving the network’s ability to obtain contextual information through attention mechanisms,thereby allowing the network to obtain richer global speech features.Experiments show that AU-Net achieves better evaluation metric scores than the baseline model.(2)We propose a multi-objective joint loss function for speech enhancement,which is a linear combination of time-domain loss,frequency-domain loss,and PESQ loss.The weights of different losses can be adjusted to control their impact during model training.Multiple comparative experiments show that each loss has its own emphasis in improving evaluation metrics,while the multi-objective joint loss function combines the advantages of the three losses and significantly improves the evaluation scores of AU-Net,outperforming other representative speech enhancement algorithms.The above work demonstrates that time-frequency attention mechanisms and multiobjective joint loss functions can improve the enhancement effect of the model.The proposed algorithm can effectively reduce speech distortion and background noise,improve speech quality and intelligibility,and its enhancement effect is better than that of most advanced models currently available.It also indicates that improving the model structure and optimizing the loss function can be a research direction for improving speech enhancement performance.

Keywords/Search Tags:

speech enhancement, deep learning, U-Net, multi-head attention mechanism, multi-objective joint loss function

Related items

1	Research On Single Channel Speech Enhancement Based On Multi-head Attention Mechanism
2	Improved Algorithm For Speech Enhancement Based On Deep Learning
3	Research On Speech Enhancement Algorithm Based On Multi-head Attention Mechanism
4	Research On Speech Enhancement Under Non-Stationary Noise
5	Research On Speech Enhancement Method Of Gated Network Based On Multi-head Attention Mechanism
6	Research On Single Channel Speech Enhancement Technology Based On Deep Neural Network
7	Muti Objective Learning And Ensembing For Deep Neura Network Based Speech Enhancement
8	Research And Implementation Of Single-channel Speech Enhancement Model Based On Deep Learning
9	Research On Speech Enhancement Technology Based On Multi-Objective Learning And Integration
10	Speech Enhancement Based On Perceptual Multilevel Mixed Attention Skip Connection