Font Size: a A A

Research On Speech Bandwidth Extension Based On Flatten Frequency Domain Network

Posted on:2023-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y LeiFull Text:PDF
GTID:2568306830486394Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Speech bandwidth extension aims to recover the missing high frequency components in narrowband speech by exploiting the mathematical relationship between narrowband speech and wideband speech.This technology helps some narrowband transmission speech,such as PSTN,online conferences,Bluetooth,etc.,to increase bandwidth.It can be used in media production fields such as old audio restoration,and can also be combined with other signal processing fields such as speech recognition to improve performance.The existing speech bandwidth extension algorithms based on deep learning are mainly divided into two research directions: time domain and frequency domain.The time domain algorithm is based on waveform modeling,and the waveform envelopes vary widely and the rules are very complex;The frequency domain algorithm is based on frequency domain feature modeling,and the frequency domain feature map can intuitively display the deep speech information hidden under the waveform.Therefore,this paper selects the frequency domain algorithm as the research direction,including:First,we propose a speech bandwidth extension algorithm based on FlattenFFTNet-IESC.Aiming at the problem that the existing mainstream frequency domain algorithm network is difficult to expand and does not use the time axis information,we propose a Flatten processing method to remove the last point of the frequency axis and convert the two axes of time and frequency into one axis,so that the input and output dimensions of the frequency domain algorithm network are exactly the same as the time domain algorithm.Aiming at the problem that the network feature extraction capability of the existing mainstream frequency domain algorithms is insufficient,we propose to use FFTNet multi-way split network and increase the IESC structure.The experimental results show that Flatten-FFTNet-IESC greatly improves the evaluation indicators,but the network dimension is high and the training cost is high.Second,we propose a speech bandwidth extension algorithm based on FlattenCNN.First of all,in order to facilitate the construction of the network and the use of time axis information,the Flatten processing method is used;then,for the problem that the Flatten-FFTNet-IESC network dimension is too high,we propose to use the encoder-decoder structure Convolutional Neural Network to reduce the dimension and training cost;finally,in order to using time domain information,a time-frequency loss is introduced into the loss function.The experimental results show that Flatten-CNN reduces the training cost and generally maintains the network feature extraction ability,but the details of the generated log power spectrum still have room for improvement.Third,we propose a speech bandwidth extension algorithm based on FlattenWGAN-GP.In order to continue to enhance the details performance of log power spectrum generated by Flatten-CNN,the Wasserstein Generative Adversarial Network is introduced.When the Nash Equilibrium is achieved by using the Generative Adversarial Network,it is difficult for the discriminator to distinguish between real samples and generated samples,so that the generated log power spectrum details are constantly close to the real log power spectrum.Aiming at the problem of unreasonable parameter distribution and decreased discriminability of the Wasserstein Generative Adversarial Network discriminator,we propose to use gradient penalty instead of weight clipping.The experimental results show that Flatten-WGAN-GP improves the details performance of log power spectrum,and the WGAN-GP has better evaluation metrics and faster training speed than the WGAN.
Keywords/Search Tags:speech bandwidth extension, frequency domain algorithms, the Flatten processing method, FFTNet-IESC, Convolutional Neural Network, encoder-decoder, time-frequency loss, Wasserstein Generative Adversarial Network, gradient penalty
PDF Full Text Request
Related items