Font Size: a A A

Research And Implementation Of Speech Enhancement Algorithm Based On Deep Learning

Posted on:2021-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:C PengFull Text:PDF
GTID:2428330623968138Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Speech enhancement refers to the use of audio signal processing technology and various algorithms to improve the intelligibility or overall perceived quality of distorted speech signals,thereby further improving application effect in scenarios such as speech recognition,voice calls,telephone conference,recording,military eavesdropping,and hearing aids.This thesis focuses on deep learning-based speech enhancement algorithms,that is,learning the mapping relationship between noisy speech and clean speech with the help of deep learning models to achieve the purpose of improving the intelligibility and quality of noisy speech signals.After an in-depth analysis of the existing algorithm's design ideas and modeling mechanisms,it is found that these methods have the following shortcomings: first,the model training target does not match the evaluation metrics,and loss function generally cannot reflect the human ear's hearing experience.The evaluation metrics are designed around human hearing,which makes the optimal model fail to achieve the best performance.Second,there is currently little research on speech enhancement under low signal-to-noise ratio conditions,and speech components are sparse.The current model lacks a design to retain voice information,making it difficult to recover complete speech and leading to a reduction in the quality and intelligibility of the enhanced speech.This thesis studies the above problems and proposes corresponding solutions.The main contributions are as follows:(1)To solve the first problem,this thesis proposes a speech enhancement algorithm based on generative adversarial neural networks and studies the game adversarial training so that the discriminator neural network learns to discriminate between clean speech and noisy speech.The ideal goal is that the discriminator is used to learn the human auditory experience and gives feedback matching the evaluation metrics for the speech enhancement model.Experimental results show that the proposed algorithm can achieve similar performance to the related work.(2)A speech enhancement algorithm based on RefineNet is proposed.In view of the second problem,the RefineBlock of this network is used to fuse shallow and deep feature maps to achieve the purpose of making full use of shallow speech feature information.In view of the second problem,it is also proposed to fuse evaluation metrics with loss functions.The combination of metrics as a loss function to achieve optimal performance makes the training goals consistent with the evaluation of metrics.In the related experiments,all the metrics of the algorithm are better than benchmark models,which proves the effectiveness of the algorithm.(3)Further,in order to solve the second problem,it's proposed that an end-to-end speech enhancement algorithm based on RefineNet.Using the characteristics of the end-to-end model without feature preprocessing,all original speech information is retained,and a network structure that simulates a short-time Fourier transform is applied,which makes neural networks automatically extract valid features.The comparison with the benchmark algorithm proves the effectiveness of the algorithm.
Keywords/Search Tags:speech enhancement, generative adversarial neural networks, RefineNet, end-to-end
PDF Full Text Request
Related items