Font Size: a A A

Analysis of variability in speech with applications to speech and speaker recognition

Posted on:2003-10-09Degree:Ph.DType:Dissertation
University:OGI School of Science & EngineeringCandidate:Kajarekar, Sachin SubhashFull Text:PDF
GTID:1468390011483516Subject:Engineering
Abstract/Summary:
The speech signal has variability due to language, speakers, and communication channels. In this work, variability due to language is referred to as variability due to different phones in the language. It is also referred to as (inter-phone or) phone variability. Variability due to speakers is referred to as speaker variability, and variability due to different communication channels is referred as channel variability. The remaining variability in the signal is referred to as residual variability.; The total variability in speech is decomposed using multivariate analysis of variance (MANOVA). Here variability in speech refers to variability in the set of features extracted from speech signal, and variability refers to covariance of features due to different phones, different speakers, and different channels. In this work, MANOVA is performed using three databases—HTIMIT, OGI Stories and OGI Numbers. Variability in the commonly used features is measured in spectral and temporal domains. The results are shown to be consistent across different databases and datasets. The results are also shown to be consistent with the previous studies.; The results of MANOVA are applied in two ways. First, we show that contribution of the variabilities in features is related to their performance on speech and speaker recognition tasks. Second, we show that results of MANOVA can be used for deriving discriminant features for a given task.; Relationship between results of MANOVA and speech and speaker recognition results is illustrated using several examples. We show that the change in the contribution of phone variability is related to change in the performance of the features on speech recognition task.; Using MANOVA, we had observed that the variability due to phones spreads for approximately 250 ms around the current frame. We include this variability in the design of features using Linear discriminant analysis (LDA). The results show that features from joint analysis perform worse than combined analysis because joint analysis over-fits the training data and does not generalize on the test data. Specifically, we show that combination of spectral and temporal discriminants yields to the best joint time-frequency discriminants. (Abstract shortened by UMI.)...
Keywords/Search Tags:Variability, Speech, Speaker, MANOVA, Show, Recognition, Features
Related items