Font Size: a A A

Research On Speech Bandwidth Extension Using Deep Neural Network

Posted on:2022-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:X P LingFull Text:PDF
GTID:2518306524498504Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In the digital public switched telephone network,due to voice collection equipment,codec methods and channel bandwidth limitations,the bandwidth of the speech signal is usually limited to the narrowband range of 0Hz?4k Hz,and the narrowband speech of the high frequency part is missing.In terms of auditory experience often the performance is low and heavy,which seriously reduces the emotion,speaker and pronunciation recognition of the speech.The speech bandwidth extension technology aims to restore the high frequency spectrum missing from narrowband speech and improve speech quality and clarity.Traditional speech bandwidth expansion technology is mostly based on the source-filter model of the speech generation mechanism.The speech bandwidth expansion task is divided into high spectrum envelope estimation and excitation signal generation,such as codebook mapping,Gaussian mixture model and hidden Markov model,etc.Because the performance of these methods is largely affected by the dimensions of acoustic features,and the ability to model narrowband and high frequency spectra is limited,there are many problems in reconstructed wideband speech.In recent years,with the general rise of deep learning technology,more and more neural network models have been successfully applied to the field of speech bandwidth expansion.However,the real highperformance and high-efficiency neural network model needs further exploration and research,so that it can be used in practice.Convenient use in the equipment.This article aims to improve the performance and real-time performance of the neural network model in the application of speech bandwidth extension.The specific research content includes:First,this paper proposes a speech bandwidth extension method based on time convolutional network.Aiming at the deficiencies of neural networks in time series data modeling capabilities,expanded causal convolutional neural networks are used to build time convolutional neural network models to map the nonlinear time-domain narrowband speech waveforms and time-domain wideband speech waveforms Relations are modeled to achieve good quality of reconstructed broadband speech;for traditional neural networks,the sample-level L1 or L2 distance between the model prediction value and the label value is often calculated as the loss function of the model.This paper further proposes the time-frequency loss Function,can prompt the model to optimize the training of the model from the two perspectives of time domain and frequency domain,and once again enhance the performance of speech bandwidth expansion.Secondly,this paper proposes a speech bandwidth extension method based on codec network.In view of the huge parameters of the neural network model and the complex algorithm,the encoder network is used to extract features and dimensionality of high-dimensional data,and the decoder network is used for broadband speech recovery,and it is used in the bottleneck layer between the encoder network and the decoder network.The long and short-term memory network is enhanced,the model's ability to learn the context of time series data is enhanced,and good subjective and objective evaluation scores have been obtained.In order to guide the update direction of the weight parameters of the model more comprehensively,the time-frequency perception loss function is further proposed,which improves the fitting accuracy of the model in the time domain,frequency domain and perception domain.Finally,this paper proposes a speech bandwidth extension method based on time-frequency sensing network.The time-frequency perception network also adopts the codec structure.Both the encoder and the decoder network use the expanded convolutional neural network,and the local sensitive hash self-attention layer is used in the bottleneck layer,which improves the speech timing of the encoder and the bottleneck layer.The acoustic feature extraction capability of the data enhances the decoder's ability to reconstruct wideband speech.In addition,in order to further improve the fitting ability of the model,a deep time-frequency perception loss function is proposed,which is superior to the traditional speech bandwidth extension method and the classical neural network speech bandwidth extension method.
Keywords/Search Tags:speech bandwidth expansion, temporal convolutional neural network, codec neural network, time-frequency loss function, time-frequency perception loss function
PDF Full Text Request
Related items