Font Size: a A A

Research On The Vocal Tract Model Based On Machine Learning Methods Of Speech Inversion

Posted on:2014-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2248330395983798Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The performance of the automatic speech recognition (ASR) systems is affected because ofcoarticulation. Existing related studies have claimed that articulatory information can be used toimprove the performance of automatic speech recognition systems. However, such articulatoryinformation is not so easy to be obtained in typical speaker-listener situations. That is why theacoustic-to-articulatory speech inversion is proposed. Acoustic-to-articulatory speech inversion(speech inversion) is a method of estimating articulatory trajectories or vocal tract configurationsfrom the speech signal. If articulatory information can be estimated accurately, it will be usefulfor speech synthesis, language acquisition, speech visualization and so on.Firstly, tract variables(instead of traditional pellet trajectories) are used as articulatoryinformation to model speech dynamics and the estimation performance and non-uniqueness oftract variables and pellet trajectories are compared in this paper. The speech signals areparameterized as mel-frequency cepstral coefficients (MFCC), perceptual linear predictioncepstral coefficients (PLPCC) and linear prediction cepstral coefficients(LPCC), and mixturedensity networks (MDN) is used to estimate tract variables and pellet trajectories. The resultsindicate that tract variables can provide a better estimation performance than pellet trajectories.Furthermore, a model-based statistical paradigm is used to calculate the NormalizedNon-Uniqueness(NNU) and the results show that uniqueness in the TV-based inverse model iscomparatively lower than the pellet-based model for the same six consonants.Secondly, four different machine learning methods are used for speech inversion, which arefeedforward artificial neural network(FF-ANN), autoregressive artificial neuralnetwork(AR-ANN), distal supervised learning(DSL) and trajectory mixture densitynetwork(TMDN), to compare tract variables and pellet trajectories. The results indicate that tractvariables have better performance than pellet trajectories and are more fit for articulatoryfeature-based ASR systems. In addition, the estimation performance of these machine learningmethods for tract variables when speech signal is parameterized as MFCC and acousticparameters (AP) is compared in the paper. The results show that3-hidden layer FF-ANN has thebest estimation performance for tract variables.
Keywords/Search Tags:speech inversion, tract variables, pellet trajectories, non-uniqueness
PDF Full Text Request
Related items