High-performance automatic speech recognition via enhanced front-end analysis and acoustic modeling

Posted on:2002-04-30

Degree:Ph.D

Type:Thesis

University:University of California, Santa Barbara

Candidate:Gu, Liang

Full Text:PDF

GTID:2468390011996370

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

This dissertation describes new paradigms and algorithms for the problem of automatic speech recognition, which is central to the future of human-machine interaction. Major performance bottlenecks of existing speech recognition techniques are due to suboptimal front-end analysis and statistical classification (or acoustic modeling). These shortcomings motivate this proposed research and the resulting approaches to the design of high-performance automatic speech recognizers.; One part of the thesis is concerned with the development of tools for optimizing the tradeoff between model complexity and modeling accuracy. The first tool is combined parameter estimation and model complexity reduction. The procedure starts by training a system of hidden Markov models (HMM) with a large universal set of Gaussian densities. It then iteratively reduces the number of distinct parameters, while re-optimizing the parameter value.; Combined parameter training and reduction is complemented by HMM state tying at the sub-state level. The state emission probabilities are constructed in two stages and viewed as a “mixture of mixtures of Gaussians.” An optimization technique is presented to seek the best complexity-accuracy tradeoff solution, which jointly exploits Gaussian density sharing and sub-state tying.; To accommodate the considerable variability of speech signals in many applications, a technique is proposed to design multiple HMM prototypes for each speech class. The procedure starts with a conventional HMM initialization. It then maximizes the likelihood by alternating between data repartitioning and a modified Lloyd's algorithm for prototype re-estimation.; Another important concern is with the prevalence of poor local optima that trap naive design methods. A proposed remedy consists of optimal parameter estimation via the deterministic annealing algorithm. The approach avoids many poor local solutions by introducing randomness into the classification rule during the training process. It minimizes the expected error rate while controlling the level of randomness via a constraint on the Shannon entropy.; The last part of the thesis is concerned with the front-end analysis. A new set of features, the perceptual harmonic cepstral coefficients, are derived. A weighting function, which depends on the split-band analysis and the pitch harmonics, is applied to the power spectrum and ensures accurate and robust representation of the voiced speech spectral envelope. For perceptual considerations, within-filter cubic-root amplitude compression is applied to reduce amplitude variation without compromise of the gain invariance properties.; Simulation results show considerable improvements over conventional methods in recognition performance by using these proposed approaches.

Keywords/Search Tags:

Recognition, Speech, Front-end analysis, Via, Proposed, HMM

PDF Full Text Request

Related items

1	Speech Recognition Front-End Processing Based On Deep Neural Network
2	The Research Of Front-end Processing Technology Based On The Speaker-independent Speech Recognition
3	Front-end of Wake-Up-Word Speech Recognition System Design on FPGA
4	Distributed Speech Recognition And Voice XML Standardlanguage In Vivid-Ring Application
5	Noise robust front-end processing for automatic speech recognition
6	Microphone array processing for robust speech recognition
7	Integrate template matching and statistical modeling for continuous speech recognition
8	Research On Continuous Speech Recognition Technology Based On HMM
9	Realization Of Front-end Algorithm In DSR System Based On FPGA
10	Research And Implementation Of Chinese Speech Recognition Methods In Noisy Environment