Font Size: a A A

Research On Tibetan Lhasa Dialect Speech Recognition Based On Deep Learning

Posted on:2017-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y C ZhangFull Text:PDF
GTID:2348330488470876Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Researchers have been working for many years to make machines to understand human language and act as commanded. In recent years, deep learning algorithms have been widely used in various areas with the improvement of computer calculating ability and the emergence of big data. Deep learning network is a kind of artificial neural network that contains many hidden layers. Deep learning network is better than that of the traditional acoustic feature extractors in extracting acoustic features. Nowadays, many researchers have already applied deep learning algorithms in their speech recognition systems. However, the methods are only adopted in the speech recognition system for major languages. It has not been applied into minority(such as Tibetan) language speech recognition at present. Therefore the thesis introduces deep learning algorithms to Tibetan Lhasa dialect speech recognition. A Tibetan speech corpus, which including 51 of Tibetan isolate words, is designed for training corpus of speech recognition. Then a deep learning network named Long Short Term Memory network(LSTM) is used as a feature extractor to extract acoustic features from Tibetan corpus. Finally the Hidden Markov Model(HMM) is employed as a recognizer to perform speech recognition.The main works and originalities of the thesis are as follows:Firstly, Tibetan speech corpus including 51 isolate words of Tibetan Lhasa dialect is built for Tibetan speech recognition. 51 commonly used Tibetan words were selected from text materials as a text corpus. A SAMPA-T(Tibetan) for labeling the pronunciation of Tibetan Lhasa dialect is designed by comparing the pronunciation between Tibetan Lhasa dialect and Mandarin with the aid of existing the Speech Assessment Methods Phonetic Alphabet for standard Chinese(SAMPA-SC). The speech corpus was recorded by Tibetan Lhasa speakers and labeled manually. 4 Tibetan Lhasa speakers are invited to record all the 51 isolated words.Each word is read 30 times by a speaker. Finally 6120 samples are obtained..Secondly, a feature extractor was established based on a deep learning algorithm named long-short term memory(LSTM) network for extracting acoustic features from Tibetan speech corpus. 13-dimensional Mel frequency cepstrum coefficients(MFCC) along with their first and second difference are extracted from the speech signal to obtain a 39-dimensional feature vector. 51 output activations of the network were obtained according to the posterior probability of 51 isolate words and added to the original 39-dimensional MFCC to compose a90-dimensional acoustic feature vector. Then the thesis applies principal component analysis(PCA) to the 90-dimensional feature vector to obtain the first 40 principal components named Tandem feature for HMM-based speech recognition.Finally, Tibetan Speech recognition was realized by combining LSTM network andHMM. LSTM network is used as a Tibetan acoustic feature extractor, and the HMM is used for speech recognition. Experimental results show that the proposed method can achieve a recognition rate up to 80.56% on the test set.
Keywords/Search Tags:Tibetan speech recognition, deep learning, Long Short Term Memory network, Hidden Markov Model, Tandem Feature
PDF Full Text Request
Related items