Font Size: a A A

Researches On Theory And Key Techniques Of Practical Chinese Speech Recognition

Posted on:2000-11-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:B TianFull Text:PDF
GTID:1118359972450024Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Nowadays, most speech recognition systems are still in their infancy and have problems if migrated from laboratory to actual applications. Aiming at the practical issues, this dissertation attempts to study, in depth, on the theory and techniques of Chinese speech recognition, and verifies the new methods proposed by using a lot of experiments. The main contributions of the dissertation are as follows:1.The dissertation analyzes the characteristics of Chinese language with communication theory and has drawn an interesting conclusion. The most important characteristic of the language is that it has a tremendous alphabet, where every symbol can carry great amount of subjective information but has rather low uncertainty because of context association.2.The dissertation proposes a novel method for vector coding, i.e. Vector Projection method, which approximately represents a vector with the orthogonal projection of its endpoint onto a straight line. Theoretical analysis and experimental results suggest that its coding accuracy based on a N-sized codebook is comparable with that of the conventional vector quantization with a N2-sized codebook, and much better than multi-stage VQ based on two N-sized codebooks, whereas its computational complexity is much lower than the latter ones.3.The dissertation presents a unified approach for the noisy Lombard and loud speech based on training data compensation. A spectral addition method is employed to update the training data to simulate additive noise. The new method is derived from the opposite of the conventional spectral subtraction method, where the cepstrum compensation for the Lombard and loud speech is based on HMM state labeling of the training data. The new approach is of good robustness in extremely noisy environment but has no apparent degradation in normal one. Moreover, it does not increase the complexity of recognition procedure.4.The dissertation proposes two acoustic confidence measures, i.e. a state-dependent measure and state-duration-dependent one, used for recognition acceptance/rejection in 11MM-based speech recognition systems, which can reject out of vocabulary (OOV) words and wrongly recognized candidates effectively, increasing the recognition rate with low rejection rate. In addition, a fast search algorithm is developed for finding N-best hypotheses based on partial rejection of Chinesesyllables, which can greatly reduce the computational cost of N-best algorithm without loosing the optimal path.5.The dissertation proposes an extended bi-grani language model utilizing the information of the most effective word pairs within a sentence, which describes the long distance movement phenomena in the Chinese language. An effective word pair detecting algorithm based on minimizing the language model perplexity is developed, which outperforms the mutual information method. In addition, the dissertation proposes a statistical language modeling method that takes use of the word segmentation information provided by the acoustic signals.
Keywords/Search Tags:Speech recognition, vector quantization, Lombard effect, Language model
PDF Full Text Request
Related items