High-performance automatic speech recognition via enhanced front-end analysis and acoustic modeling | Posted on:2002-04-30 | Degree:Ph.D | Type:Thesis | University:University of California, Santa Barbara | Candidate:Gu, Liang | Full Text:PDF | GTID:2468390011996370 | Subject:Engineering | Abstract/Summary: | PDF Full Text Request | This dissertation describes new paradigms and algorithms for the problem of automatic speech recognition, which is central to the future of human-machine interaction. Major performance bottlenecks of existing speech recognition techniques are due to suboptimal front-end analysis and statistical classification (or acoustic modeling). These shortcomings motivate this proposed research and the resulting approaches to the design of high-performance automatic speech recognizers.; One part of the thesis is concerned with the development of tools for optimizing the tradeoff between model complexity and modeling accuracy. The first tool is combined parameter estimation and model complexity reduction. The procedure starts by training a system of hidden Markov models (HMM) with a large universal set of Gaussian densities. It then iteratively reduces the number of distinct parameters, while re-optimizing the parameter value.; Combined parameter training and reduction is complemented by HMM state tying at the sub-state level. The state emission probabilities are constructed in two stages and viewed as a “mixture of mixtures of Gaussians.” An optimization technique is presented to seek the best complexity-accuracy tradeoff solution, which jointly exploits Gaussian density sharing and sub-state tying.; To accommodate the considerable variability of speech signals in many applications, a technique is proposed to design multiple HMM prototypes for each speech class. The procedure starts with a conventional HMM initialization. It then maximizes the likelihood by alternating between data repartitioning and a modified Lloyd's algorithm for prototype re-estimation.; Another important concern is with the prevalence of poor local optima that trap naive design methods. A proposed remedy consists of optimal parameter estimation via the deterministic annealing algorithm. The approach avoids many poor local solutions by introducing randomness into the classification rule during the training process. It minimizes the expected error rate while controlling the level of randomness via a constraint on the Shannon entropy.; The last part of the thesis is concerned with the front-end analysis. A new set of features, the perceptual harmonic cepstral coefficients, are derived. A weighting function, which depends on the split-band analysis and the pitch harmonics, is applied to the power spectrum and ensures accurate and robust representation of the voiced speech spectral envelope. For perceptual considerations, within-filter cubic-root amplitude compression is applied to reduce amplitude variation without compromise of the gain invariance properties.; Simulation results show considerable improvements over conventional methods in recognition performance by using these proposed approaches. | Keywords/Search Tags: | Recognition, Speech, Front-end analysis, Via, Proposed, HMM | PDF Full Text Request | Related items |
| |
|