Font Size: a A A

Research On Speech Bandwidth Extension Methods Using Neural Networks

Posted on:2018-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y GuFull Text:PDF
GTID:2348330512485627Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Speech bandwidth extension(BWE)techniques aim at automatically restoring the missing high-frequency components of narrowband speech by exploiting the correlation that exists between low and high frequency parts of wideband speech.The bandwidth of speech signal is usually limited to a particular narrowband of frequencies due to the restriction of speech acquisition equipments and transmission systems.The absence of high-frequency counterpart leads to a muffled sound,resulting in seriously degraded speech quality,naturalness and speaker-similarity.Conventional statistical models like Gaussian mixture models(GMMs)are used to model the mapping relationship from narrow speech features to high-frequency fea-tures.However,due to the over-smoothing effect,GMM-based methods suffer from the loss of spectral details and the muffled speech quality.Compared with GMMs,neural networks possess better acoustic modeling ability and have significantly improved the naturalness,intelligibility and quality of the generated speech in various speech genera-tion tasks,such as speech enhancement and voice conversion.This dissertation focuses on the study of BWE approaches based on several different kinds of neural networks.The main work in the dissertation is listed as follow:Firstly,BWE methods using deep neural networks(DNNs)are studied in this dis-sertation.Various types of strategies for pre-training and training neural networks are employed to restore high frequency spectral envelopes from low frequency ones.Ex-perimental results show that the neural network based BWE methods proposed in this paper can achieve better performance than the GMM-based one in both objective and subjective tests.Furthermore,a multi-task learning BWE approach with narrow speech state classification as the secondary task is presented,which achieves better prediction accuracy than the single-task BWE regression method.Secondly,this dissertation presents a novel method for BWE using deep recurrent neural networks(RNNs).RNNs incorporating long short-term memory(LSTM)cells are adopted to model the complex mapping relationship between the feature sequences describing low-frequency and high-frequency spectra.In order to utilize linguistic in-formation during the prediction of high-frequency spectral components,the bottleneck(BN)features derived from DNN based state classifier for narrowband speech are em-ployed as auxiliary inputs.Experimental results show that the proposed BWE methods can achieve better performance than the conventional method based on GMMs and the DNNs in both objective and subjective tests.Thirdly,this dissertation presents a waveform modeling and generation method for BWE using stacked dilated convolutional neural networks(CNNs).Distinguished from conventional frame-based BWE approaches,the proposed methods can avert the spectral conversion and phase estimation problems by modeling the speech waveforms directly and achieve better performance on subjective preference tests.A BWE ap-proach based on conditional dilated CNNs is also proposed in this dissertation,which employs BN features as additional condition information.Furthermore,an efficiency optimization strategy using recursive convolution architectures is applied to reduce the model size.Finally,this dissertation introduces several statistical parametric speech synthesis methods combining with BWE approaches.BWE system based on dilated CNNs is ex-ploited to construct the high-sampling rate parametric speech synthesis systems merely with a low-sampling rate speech corpus.The proposed TTS approaches can achieve equivalent speech quality with the speech synthesis system using natural high-sampling rate speech corpus.
Keywords/Search Tags:speech bandwidth extension, deep neural networks, recurrent neural net-works, stacked dilated convolutional neural networks, statistical parametric speech syn-thesis
PDF Full Text Request
Related items