Font Size: a A A

Acoustic Feature Learning With Deep Neural Networks For Phoneme Recognition

Posted on:2015-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhengFull Text:PDF
GTID:2308330452469518Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a subfield of speech recognition, phoneme recognition has long been regardedas a very important subject, for it was usually used to gauge the quality of an acousticmodel. Speech recognition have improved tremendously after deep neural network wasintroduced into this field. Although many researches have been focused on building d-iferent types of neural networks and have achieved some good results, another kind ofresearch, feature learning, which has been accepting more and more attention from otherfields of computer science such as computer vision, has not been paid much attention toin speech recognition.Our research started from the idea of feature learning, proposed a series of featurelearning methods with deep neural network from acoustic features. All these methods fo-cuses on diferent aspects of characteristic of acoustic feature and learn new features fromit. We then used the learned feature to phoneme recognition task to test their performance.Specifically, we have conducted the following research:1. We proposed tandem deep neural network approach and used it as an acousticmodel for the task of phoneme recognition. Tandem deep neural network uses two levelof deep neural network as acoustic model, and the output of the posterior distributionof acoustic feature through the first level of deep neural network. And we regard suchposterior distribution as feature for a second level deep neural network. In this way, thedeep neural network in the first level can be regarded as a feature learner and the learnedfeature is then modeled in the second level. We investigated the best configuration ofthe second level deep neural network, and the the best results achieved relatively4%ofimprovement over single deep neural network model.2. We proposed Multivariate Gaussian restricted Boltzmann machine and used it asa model to learn feature for robust speech recognition. Multivariate Gaussian restrict-ed Boltzmann machine was designed especially for acoustic feature vectors, and it wasintended to be build as a improvement of Gaussian restricted Boltzmann machine. Weproved in Aurora-2corpus that features learned from this model is able to gain10%accuracy over original acoustic features.3. We proposed a new model called contrastive auto-encoder which is able to learnnew feature that possess special characteristics from original acoustic feature. It was pro- posed mainly to deal with the compounded information in acoustic features. Contrastiveauto-encoder is modeled with2auto-encoders and their middle layer was coupled. Theoptimization of this model was to endow it with an ability to extract certain informationthat is most relevant to the task we are dealing with. We proved the superiority of thefeature extracted with this model over original features.4. We proposed a new frame work that is able to learn dynamic features with neu-ral network, and the feature learned can be used as an alternative of traditional dynamicfeature. Dynamic feature has long been an essential part of features used in speech recog-nition, yet little research has been done to improve it. We proposed using a layer of neuralnetwork to replace what was used to be calculated with a relatively simple formulationand thus making it to a much more wider spaces. By optimizing certain task like phonemerecognition, the neural network we designed is able to learn a new kind of dynamic fea-ture from the space. We have proved it to be much better than traditional dynamic feature,and the efect is becoming clearer as the order of diferences increases.
Keywords/Search Tags:phoneme recognition, speech recognition, neural network, feature learning, deep learning
PDF Full Text Request
Related items