Font Size: a A A

Research On Methods Of Improving Low Bit Rate Speech Coding Quality Based On Deep Learning

Posted on:2018-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ZhaoFull Text:PDF
GTID:2348330518498904Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Speech communication plays an important role in the area of multimedia communications,and the low-bit-rate speech coding is widely used in narrow and secure speech communications by virtue of using fewer frequency resources.However,lower coding rate always means less information being carried and therefore the speech quality must be influenced to some extent according to the compression.How to guarantee the quality of reconstructed speech in the low-bit-rate speech communications is still a challenged task for speech researchers.Deep learning(DL)technology promotes traditional neural network developing into deeper and more complex deep neural network(DNN).What is more,DL has been proved to be the best tool for the speech processing,especially outperformed in the speech recognition,speech separation,in recent years.Based on the above,two new speech processing methods based on DL are proposed to improve the quality of low-bit-rate speech coding.Speech classification has great impact on the quality of speech coding,but with the traditional methods,the accuracy of speech classification will decline sharply by influence of background noise.So one part of the thesis is to study and design a speech classification method based on DL in order to increase the accuracy of speech classification in the different noise environments.The stack autoencoder(SAE)always performs better than other models for classification.In the experiment of speech classification,parameters such as pitch,linear spectrum frequency(LSF),energy of sub-band and so on,extracted under different signal-to-noise ratio(SNR),are normalized and then feed the SAE in random order.The SAE is trained unsupervised layer by layer and fine tuned by standard back propagation(BP)with supervision to improve accuracy.The labels in this experiment are flags of voice or unvoice extracted without noise.Bandwidth of speech signal is another big factor that has a great impact on quality of speech coding.If the bandwidth of speech is limited or reduced because of speech compression and communication channels,the reconstructed or transmitted speech may loss some naturalness.Because of this,another part of the thesis is to study and design a DL based bandwidth expanding method for speech signals in order to increase the speech quality by improve the speech naturalness in the cases of bandwidth loss.The bandwidth expansion of narrow speech signal can be realized with feed forward neural network(FFNN)in the ending of codec system.First,the decoding signal is applied with FFT to obtain the narrow envelop acting as the inputs of the network.While high spectral envelop of broadband speech works as the desired signal.The FFNN can map the narrow envelop to high frequency and the phase can be obtained by flipping the narrow phase.The phase and envelop are combined to reconstructed the signal in high frequency.Then the reconstructed signal is handled by IFFT and combined with the original narrow speech in time domain to obtain the broadband signal.In order to evaluate the effectiveness of the designed methods for speech classification and bandwidth expansion,some tests have been made.For the speech voice/unvoiced classification,the proposed algorithm is able to improve the accuracy of classification in different SNR noise environments,especially when the SNR is lower.As a good example,when the designed classification is applied to the mixed excitation linear prediction(MELP)speech coding system,the voice/unvoiced classification accuracy is increased in different noise conditions.As to the bandwidth extension,the log spectrum distortion is used to measure the performance of the algorithm,and the log spectrum distortion of the test is reduced from 2.2372 d B to 0.8883 d B,before and after the bandwidth expansion.The results also indicate that the DNN can be used to improve the quality of low-bit-rate speech coding and these imply that the applications of DL in speech compression and processing will have huge potential.
Keywords/Search Tags:Speech Coding, Mixed Excitation Linear Predication, Deep Neural Network, Speech Classification, Band Extension
PDF Full Text Request
Related items