Font Size: a A A

A Speech Enhancement System Based On Deep Learning And Parallel Computing

Posted on:2017-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:B T ShiFull Text:PDF
GTID:2308330485961598Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years, with the development of computing capability and the advent of optimization algorithms, real-time speech recognition becomes practicable gradually. Although the accuracy of close talk recognition has been greatly improved, there are still some obstacles to daily use. It is mainly because speech is usually interfered by noises in real environment. While speech enhancement techniques aim to process noise-mixed audio signal and reconstruct clean speech data from the mixture.Speech Enhancement, also known as Speech Separation and Speech De-noise, is a technique to extract valuable speech signal from noise-mixed voice data and also suppress noise signal. As a result, it tends to increase Signal-Noise Ratio (SNR) of an input signal and improve speech intelligibility. Speech Enhancement is an essential preprocess of various applications. It is valuable on research and industry. E.g. automatic speech recognition (ASR), hearing-aid, telecommunication. Therefore, effectiveness of Speech Enhancement has a profound impact on these works.There are many algorithms for speech enhancement. Most of them focus on digital signal processing (DSP) method with many complex models and algorithms. In this study. I will utilize deep neural network (DNN) with parallel accelerated computing to solve the problem of speech enhancement.Deep neural network extends the artificial neural network (ANN) with more hidden layers, mapping the input feature to the target. With huge data, DNN can get rid of manual feature extraction and use raw data directly to extract the valuable representation from the input automatically. This kind of feature has a strong advantage of representation learning.One practical problem of DNN is the high computational cost, which limits its application. However it is gradually being accepted by the devel-opment of computing capability. With the emergence of general purpose graphic process unit (GPGPU), parallel computing developed fast. Most al-gorithms of DNN can be divided into a matrix operation and this makes it ideal for using GPGPU to perform procedures of DNN.However, using neural network model often introduces overfit problem. It is hard to train a single model and enhance the speech signal that mixed any type of noises. In this study, I proposed a structure of Multi-Model Speech Enhancement System, and combined the Ideal Ratio Mask and Re-construction Error for selecting the model correctly. The proposed structure leads to a great improvement on generalization.The innovation of this study is mainly reflected in the following aspects: First, the use of artificial neural networks for speech enhancement. Second, an algorithm was proposed to solve the generalization problem by utilizeing a multi-model framework. This structure introduces three different neural network models to reinforcing effect of model matching. In addition, this study uses GPU to boost training progress and raise an algorithm named dynamic noise mix to overcome the bottleneck of single GPU’s memory.The experiment result shows the proposed system achieves 0.3274 PESQ improvement in average and 0.3647 best.
Keywords/Search Tags:Deep Neural Network, Parallel Computing, Speech Enhancement, Ideal Binary Mask
PDF Full Text Request
Related items