Font Size: a A A

Research On Ultra-low Bit-rate Speech Coding And Speech Enhancement

Posted on:2019-09-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:W B JiangFull Text:PDF
GTID:1368330590470391Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the development of signal processing technology,speech communication systems and speech recognition systems have achieved good results under ideal conditions.However,in the complex application scenarios such as low-rate speech communication under the condition of limited bandwidth resources and speech recognition under strong noise interference conditions,the performance of existing systems will be greatly reduced.Low bit rate speech coding will result in a decrease in speech intelligibility due to quantization error,and will further deteriorate as the code rate decreases,which imposes higher requirements on high intelligibility and low bit rate speech codec.The interference of the environment noise in the practical application system will greatly reduce the speech quality and the recognition rate,which also imposes higher and higher requirements on the speech enhancement(or noise reduction)technology.This dissertation focuses on two issues of low bit rate speech codec and speech enhancement in complex scenes.For very low bit-rate speech coding,this dissertation studies a high-intelligence low-rate speech coding method that requires only a single quantization coding parameter,and implements a semantic layer coding and decoding method based on this.The details are as follows:A low-rate speech codec framework using Mel-Cepstrum coefficients(MFCC)is proposed.The framework only needs one parameter to represent the speech signal,so that the problem of joint vector quantization between parameters is not taken into consideration when implementing very low code rate quantization coding,thereby greatly simplifying the design of the quantizer.In order to realize high-quality speech signal reconstruction based on Mel cepstrum coefficients,the voicing classification and pitch period estimation using the mixed Gaussian model are implemented in the decode,and an improved amplitude spectrum iterative approximation method reconstruction is implemented using the voicing classification and pitch to obtain time-domain speech.The reconstruction method makes full use of the characteristics of the speech signal.The time-domain signal is initialized to the minimum phase signal or the synthesized phase signal,thereby realizing the reconstruction of high quality speech and accelerating the convergence speed of the iterative algorithm.Based on the MFCC coding method,a semantic-layer low-bit rate coding method using deep neural network(DNN)is proposed.In implementing the DNN-based semantic-layer codec,the reconstruction of signals from speech parameters and the quantization of high-dimensional data were studied.The restricted Boltzmann machine structure is used to extract the features of the semantic layer of the speech signal,and the semantic layer reconstruction of the signal power spectrum is realized.The deep automatic encoder is used to quantify the high-dimensional data,and a vector quantization method that integrates a conventional encoder and a neural network decoder is implemented.Using the DNN to reconstruct speech signal and quantize high dimensional-vector,a semantic level low bit rate speech codec with high intelligibility is implemented.In order to solve the problem of speech enhancement in complex environments,this dissertation studies a single-channel speech enhancement method that combines specific human information,and implements a noise robust multi-channel spatial filtering method that does not rely on Direction of Arrival estimation.The details are as follows:A single-channel speech enhancement algorithm that integrates speaker-dependent information is implemented.This algorithm studies noise estimation,noise classification,noise robust speaker recognition,extraction and fusion of speaker-dependent information,and proposes a noise estimation method based on adaptive Gaussian model and a noise classification method using parameter domain features.A speaker recognition method corresponding to the speaker model and a method of extracting information from the speaker model to the speech enhancement algorithm are respectively established for the typical noise environment,thereby reducing the dependence on the noise estimation algorithm and effectively enhancing the speech signal.The noise robust array signal spatial filtering algorithm is studied,including an improved beamforming algorithm that does not depend on Direction of Arrival(DOA)estimation and a blind beamforming algorithm based on generalized eigenvalue decomposition based on the minimum variance undistorted response.The key to robust beamforming algorithms is the time-frequency mask estimation of signals and noise.Aiming at the shortcomings of the traditional time-frequency mask estimation algorithms,a time-frequency mask estimation algorithm for the real-Gaussian model of the power spectrum domain is implemented.Compared with the traditional timefrequency mask estimation method using the complex-Gaussian model,the computational complexity is greatly reduced.A time-frequency mask estimation algorithm based on deep neural network is also implemented.The method uses multi-objective training method and integrates array spatial information.Compared with the relative methods,the accuracy of time-frequency mask is greatly improved.In summary,this paper studies the key issues of ultra-low bit rate speech coder and speech enhancement,proposes a MFCC-based speech coding framework and implements the semantic layer speech coding method.This paper also proposes a speech enhancement algorithm that fuses speaker-dependent information,and a noise robust spatial denoising method are implemented.This paper provides theoretical basis and practical reference for the application of very low bit rate speech coder and speech enhancement technology.
Keywords/Search Tags:Speech signal processing, ultra-low bit-rate speech coding, speech enhancement, beamforming, deep neural network
PDF Full Text Request
Related items