Font Size: a A A

Speech-Music Discrimination For Hybrid Coder

Posted on:2019-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:W Z YangFull Text:PDF
GTID:2428330545986970Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
The hybrid coder is able to choose different coding mode according to the type of input signal and obtain the best coding quality.As for hybrid coder,accuracy of dis-crimination from speech and music is one of the decisive factor.AMR-WB+(Extended Adaptive Multi-Rate-Wideband codec)and EVS(Codec for Enhanced Voice Service)which come from 3GPP are typical representative of hy-brid codec.AMR-WB+ has two coding method,closed-loop method and open-loop method.The former has higher classification accuracy than the latter but the complex-ity is also higher.In comparison,the latter which is open-loop method is more excel-lent in terms of complexity but the accuracy is unsatisfactory.However,EVS has no closed-loop method and get lower complexity,but the classification algorithm based on Gaussian Mixed Model can be significantly improved.Aiming at problems above,this paper considers the neural network and the time correlation between audio frames and comes up with Recurrent Neural Network(RNN)for speech and music discrimination.The main contributions of this paper are listed below.(1)RNN classifier for AMR-WB+The RNN classifier gets features from AMR-WB+ itself and labels the training data based on the output of closed-loop method.The goal is to design a classifier in open-loop method and use it to match the output of closed-loop method.In order to satisfy this need,this paper has designed a classifier based on RNN,solved the prob-lem of unbalanced training data,and controlled the outcome of RNN network from the angle of maximizing the coding quality.The experiment shows that the proposed method has similar complexity with open-loop method and comparable accuracy of closed-loop method.The accuracy has been improved by about 20%and the subjec-tive quality is also quite close to closed-loop method.(2)RNN classifier for EVSEVS codec cannot use the strategy like closed-loop method in AMR-WB+ to la-bel the training data,but can only rely on the subjective judgment.Hence the datasets must be pure enough.This paper chooses speech and music data from professional audio datasets to make up the training data and test data and obtains features from EVS codec for RNN training.The experiment shows that accuracy of both music and speech signal has been improved,especially for the music signal.The contributions in this paper have important significance for improving the performance of hybrid codec.
Keywords/Search Tags:Hybrid coder, coding mode selection, signal discrimination, recurrent neural network
PDF Full Text Request
Related items