Voice communication is one of the main services of the communication system,and speech coding is an important part of voice communication.The analog audio signal can be converted into a digital signal through quantization coding,and the quantization coding can effectively save the bandwidth resources in the communication transmission process.Vocoder is a useful tool to realize low-bit-rate speech coding,and linear predictive vocoder is widely used because of its simple algorithm.The linear predictive vocoder realizes the low-bit-rate transmission of speech by extracting the linear prediction parameters of the input speech signal,but the naturalness of the output speech is not very good.In recent years,the wide-scale application of deep learning technology has prompted many researchers to use neural networks in speech signal processing.At present,many neural network models can perform very prominently in the fields of speech recognition and speech synthesis,but there is no in-depth research in the field of speech coding.Therefore,this thesis focuses on the low-bit-rate speech coding algorithm based on deep learning and linear prediction,and focuses on the analysis and synthesis of speech,aiming to build a high-quality and efficient speech decomposition and synthesis model.WaveNet and WaveRNN are two typical speech neural vocoders that can synthesize high-quality speech.This thesis studies these two neural network models through experiments.The two models are trained using an open source corpus,and the performance of the algorithm model is analyzed.It is concluded that the algorithmic complexity of the Wave Net model is high,so the training cost is high;while the Wave RNN model can output higher quality speech in the same experimental environment.After basic testing of the two models,this thesis focuses on exploring the Wave RNN-based linear predictive neural vocoder---LPCNet.By changing the feature parameter dimension of the vocoder front-end module,its algorithm performance under different feature types and dimensions is studied;and by changing the frequency band division and pitch range of the speech signal,the vocoder can achieve high-quality analysis and synthesis of narrowband speech with a sampling rate of 8000 Hz.The perceptual evaluation of speech quality value PESQ of the reconstructed speech obtained by testing can reach 3.106,completed to broaden its practical application range.Secondly,in view of the low accuracy of the linear prediction algorithm in the LPCNet vocoder,after the basic analysis of the traditional vocoder---mixed excitation linear prediction algorithm,the method of calculating the linear prediction parameters in the traditional algorithm is used in the LPCNet neural vocoder algorithm,which effectively improves the distortion phenomenon of the neural vocoder.After that,this thesis studies the neural network part.The research content includes the overfitting of the LPCNet neural vocoder network and the effect of the change of the learning rate on the network convergence are studied.The results show that by dynamically adjusting the learning rate,the network can converge more in the same time,and the perceptual evaluation of speech quality value PESQ of the output speech reaches3.116.Finally,this thesis constructs a linear predictive neural vocoder based on long short-term memory network LSTM and temporal convolutional network TCN,and explores the performance effects of different network modules on the linear predictive neural vocoder through experiments.The results show that although the complexity of the temporal convolutional network is lower,the quality of the synthesized speech is not good;while the long short-term memory network can achieve high-quality analysis and synthesis of speech.Compared with the gate recurrent unit of LPCNet,long short-term memory network can output higher quality Chinese speech,and the perceptual evaluation of speech quality value PESQ of the output Chinese speech is 3.220. |