Font Size: a A A

Alternative regularized neural network architectures for speech and speaker recognition

Posted on:2013-11-17Degree:Ph.DType:Thesis
University:The Johns Hopkins UniversityCandidate:Garimella, Sri Venkata SuryaFull Text:PDF
GTID:2458390008474334Subject:Biology
Abstract/Summary:
Artificial Neural Networks (ANNs) have been widely used in a variety of speech processing applications. They can be used either in a classification or regression mode. Proper regularization techniques are necessary when training these networks, especially in scenarios where the amount of training data is limited or the number of layers in a network is large. In this thesis, we explore alternative regularized feed-forward neural network architectures and propose learning algorithms for speech processing applications such as phoneme recognition and speaker verification.;In a conventional hybrid phoneme recognition system, a multilayer perceptron (MLP) with a single hidden layer is trained on standard acoustic features to provide the estimates of posterior probabilities of phonemes. These estimates are used for decoding the underlying phoneme sequence. In this thesis, we introduce a sparse multilayer perceptron (SMLP) which jointly learns an internal sparse feature representation and nonlinear classifier boundaries to discriminate multiple phoneme classes. This is achieved by adding a sparse regularization term to the original cross-entropy cost function. Instead of MLP, SMLP is used in a hybrid phoneme recognition system. Experiments are conducted to test various feature representations, including the proposed data-driven discriminative spectro-temporal features. Significant improvements are obtained using these techniques.;Another application where neural networks are used is in speaker verification. Auto-Associative Neural Network (AANN) is a fully connected feed-forward neural network, trained to reconstruct its input at its output through a hidden compression layer. AANNs are used to model speakers in speaker verification, where a speaker-specific AANN model is obtained by adapting (or retraining) the Universal Background Model (UBM) AANN, an AANN trained on multiple held out speakers, using corresponding speaker data. When the amount of speaker data is limited, this procedure may lead to overfitting as all the parameters of UBM-AANN are being adapted. To alleviate this problem, we regularize the parameters of AANN by developing subspace methods namely weighted least squares (WLS) and factor analysis (FA). Experimental results show the effectiveness of the subspace methods over directly adapting a UBM-AANN for speaker verification.
Keywords/Search Tags:Neural network, Speaker, AANN, Speech, Used, Recognition
Related items