Font Size: a A A

Research On Acoustic Unit Modeling And Its Application Based On Nonparametric Bayesian Method

Posted on:2019-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:R R WangFull Text:PDF
GTID:2428330566970968Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the facilitation of acquisition of speech data,we live in an era in which we can acquire speech data indefinitely.However,we cannot fully and effectively use unlabeled speech data.Labeling speech data is a time-consuming and expensive process.In order to make speech recognition tasks more successful,we need to reduce the dependence on large labeling data.The unsupervised acoustic unit discovery of speech signals aims to find acoustic units from unlabeled speech data,and has a wide range of applications in the fields of automatic speech recognition and cognitive science.In this paper,based on the non-parametric Bayesian method,the acoustic unit discovery technology and its application in speech are studied.The main work and contributions are as follows:1.Based on the characteristics that different languages have similar acoustic units,a technique of acoustic unit discovery based on multilingual resourse is proposed.Using the Dirichlet Process(DP)as a priori,a Dirichlet Process Hidden Markov Hodel(DPHMM)was established for acoustic unit discovery.The test data is the TIMIT corpus,which is a different language.The experimental results show that the nonparametric Bayesian model based on multilingual resourse can find acoustic units that are highly related to English phone sets.2.One-shot learning of spoken words based on multilingual is propose.A Bayesian Hierarchical Hidden Markov Model(HHMM)used for acoustic unit discovery not only learn the unigram statistics of acoustic units,but also the bigram transition probabilities between the discovered units.Because each word can be combined with these acoustic units,a one-shot learning classification experiment can be performed based on the discovered acoustic units.In the classification task,the new words are classified based on only one spoken word.The words used for classification are English,Japanese and Chinese which are different from the training language.The experimental results show that the classification experiments of each language have obtained good classification results.That is to say,the acoustic model training with multilingual resourse can better guide the classification of spoken words in different target languages.3.A hierarchal linguistic structures discovery method based on nonparametric Bayesian method is proposed.This paper uses the Adaptive Grammar(AG)model as the basis,integrates it with a noisy-channel mode and an acoustic model to construct a new probability framework to discover the hierarchal linguistic structures in speech data.Not only the acoustic units can be found in continuous speech,but also higher-level structures such as syllables and lexical units can also be learned directly from acoustic signals.When tested on the TIMIT corpus,the experimental results show the model can learn lexical units that correspond to sub-words,single words,and multi-word phrases.The subjective comparative analysis shows that the discovered linguistic structures consistent with the linguistic structures of actual sentence.4.A variational Bayesian method for acoustic unit discovery is proposed.In the inference algorithm of Bayesian models,the commonly used of stochastic approximation method is the Gibbs sampling.This approach suffers from the drawback that the model parameters cannot be sampled asynchronously and the rate of convergence is slower.It is limited to handling big data.A kind of deterministic approximation method is Variational Bayesian inference.VB method can be parallelized to obtain the optimal solution through constant iteration with a set of mutually dependent equations.Experimental results show that VB training is faster than GS due to parallelization.The VB training process does not require boundary information at all.The GS process requires pre-segmentation of speech data,and the VB training performance is still better than GS.
Keywords/Search Tags:Nonparametric Bayesian, Acoustic Unit Discovery, Multilingual Resource, One-shot Learning, Hierarchical Linguistic Structures, Variational Bayesian, Transfer Learning
PDF Full Text Request
Related items