Font Size: a A A

A Study On The Extraction Of Speech Depth In Tibetan Language And Its Speech Recognition

Posted on:2017-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:X F LiuFull Text:PDF
GTID:2278330485455844Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Tibetan language is an important minority language in China. The research on Tibetan speech recognition technology can not only make barrier-free communication between different nationalities, but also prompt the development in various fields of Tibetan areas, such as economy, culture, education and so on. However, the Tibetan speech recognition research is still in early stages.In the 21st century, deep learning has gradually become the new research direction of speech feature extraction. Multiple nonlinear transformation technology in deep learning is used to extract features of raw data, that is from low to high, from the concrete to the abstract and from the general to characteristics of semantic. This article applied deep learning method to Tibetan speech recognition research. Firstly, we introduce the research status of Tibetan speech recognition, the basic principle of speech recognition and the theory of deep learning. Then we emphatically elaborate the application about the deep feature extraction model in Tibetan speech recognition.1. Study on Tibetan speech feature extractionHand-craft features can make the original voice and data missing, but deep learning can overcome this shortcoming and learn the feature that human cannot define. Therefore, the feature learned based on deep learning can reflect the abundant feature information of raw data. In this paper, we use Sparse autoencoder and deep belief network models to extract Tibetan speech features. From the principle of the speech feature extract models, we introduce the unsupervised training and supervised fine-tune methods for deep learning models.2. Study of acoustic models based on depth features for Tibetan speech recognitionBased on the deep features, we use GMM-HMM model to recognize phonemes and syllable. Experimental results show that the highest phoneme recognition rate of SA+MFCC feature is 69.05%, increased by 10.22% over MFCC feature and syllable recognition rate is 48.54%, increased by 24.6%. the highest phoneme recognition rate of DBN+MFCC feature is 69.46%, increased by 10.63% over MFCC feature and syllable recognition rate is 49.04%, increased by 25.1%. And DBN model uses less number of iterations, so DBN model is more efficient. In later Tibetan speech recognition research, we will launch on Tibetan continuous speech recognition study based on Deep belief network model.
Keywords/Search Tags:Speech recognition, Hidden Markov Model, Sparse autoencoder, Deep belief network, acoustic model
PDF Full Text Request
Related items