Alternative regularized neural network architectures for speech and speaker recognition

Posted on:2013-11-17

Degree:Ph.D

Type:Thesis

University:The Johns Hopkins University

Candidate:Garimella, Sri Venkata Surya

Full Text:PDF

GTID:2458390008474334

Subject:Biology

Abstract/Summary:

Artificial Neural Networks (ANNs) have been widely used in a variety of speech processing applications. They can be used either in a classification or regression mode. Proper regularization techniques are necessary when training these networks, especially in scenarios where the amount of training data is limited or the number of layers in a network is large. In this thesis, we explore alternative regularized feed-forward neural network architectures and propose learning algorithms for speech processing applications such as phoneme recognition and speaker verification.;In a conventional hybrid phoneme recognition system, a multilayer perceptron (MLP) with a single hidden layer is trained on standard acoustic features to provide the estimates of posterior probabilities of phonemes. These estimates are used for decoding the underlying phoneme sequence. In this thesis, we introduce a sparse multilayer perceptron (SMLP) which jointly learns an internal sparse feature representation and nonlinear classifier boundaries to discriminate multiple phoneme classes. This is achieved by adding a sparse regularization term to the original cross-entropy cost function. Instead of MLP, SMLP is used in a hybrid phoneme recognition system. Experiments are conducted to test various feature representations, including the proposed data-driven discriminative spectro-temporal features. Significant improvements are obtained using these techniques.;Another application where neural networks are used is in speaker verification. Auto-Associative Neural Network (AANN) is a fully connected feed-forward neural network, trained to reconstruct its input at its output through a hidden compression layer. AANNs are used to model speakers in speaker verification, where a speaker-specific AANN model is obtained by adapting (or retraining) the Universal Background Model (UBM) AANN, an AANN trained on multiple held out speakers, using corresponding speaker data. When the amount of speaker data is limited, this procedure may lead to overfitting as all the parameters of UBM-AANN are being adapted. To alleviate this problem, we regularize the parameters of AANN by developing subspace methods namely weighted least squares (WLS) and factor analysis (FA). Experimental results show the effectiveness of the subspace methods over directly adapting a UBM-AANN for speaker verification.

Keywords/Search Tags:

Neural network, Speaker, AANN, Speech, Used, Recognition

Related items

1	Research On Speaker Adaptation Of Neural Network Acoustic Models For Speech Recognition
2	Speaker Adaptation Of DNN-HMM Acoustic Model For Speech Recognition
3	Monaural Multi-speaker Speech Separation And Recognition
4	Short Speech Speaker Recognition Method Based On Deep Learning And Its Application In Speech Separation
5	Research On Speaker Recognition Based On MFCC And PSO-BP Neural Network
6	Research On Speaker Adaptation In Speech Recognition
7	Research On Speaker Identification Based On Speech Processing
8	Research And Application Of Chinese Text-to-speech Based On Recurrent Neural Network
9	Research On Multi-person Speech Recognition Based On Deep Learning
10	Research On Speaker Recognition Technology Based On Speech Enhancement