Research On Tibetan Lhasa Dialect Speech Recognition Based On Deep Learning

Posted on:2017-11-11

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Zhang

Full Text:PDF

GTID:2348330488470876

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

Researchers have been working for many years to make machines to understand human language and act as commanded. In recent years, deep learning algorithms have been widely used in various areas with the improvement of computer calculating ability and the emergence of big data. Deep learning network is a kind of artificial neural network that contains many hidden layers. Deep learning network is better than that of the traditional acoustic feature extractors in extracting acoustic features. Nowadays, many researchers have already applied deep learning algorithms in their speech recognition systems. However, the methods are only adopted in the speech recognition system for major languages. It has not been applied into minority(such as Tibetan) language speech recognition at present. Therefore the thesis introduces deep learning algorithms to Tibetan Lhasa dialect speech recognition. A Tibetan speech corpus, which including 51 of Tibetan isolate words, is designed for training corpus of speech recognition. Then a deep learning network named Long Short Term Memory network(LSTM) is used as a feature extractor to extract acoustic features from Tibetan corpus. Finally the Hidden Markov Model(HMM) is employed as a recognizer to perform speech recognition.The main works and originalities of the thesis are as follows:Firstly, Tibetan speech corpus including 51 isolate words of Tibetan Lhasa dialect is built for Tibetan speech recognition. 51 commonly used Tibetan words were selected from text materials as a text corpus. A SAMPA-T(Tibetan) for labeling the pronunciation of Tibetan Lhasa dialect is designed by comparing the pronunciation between Tibetan Lhasa dialect and Mandarin with the aid of existing the Speech Assessment Methods Phonetic Alphabet for standard Chinese(SAMPA-SC). The speech corpus was recorded by Tibetan Lhasa speakers and labeled manually. 4 Tibetan Lhasa speakers are invited to record all the 51 isolated words.Each word is read 30 times by a speaker. Finally 6120 samples are obtained..Secondly, a feature extractor was established based on a deep learning algorithm named long-short term memory(LSTM) network for extracting acoustic features from Tibetan speech corpus. 13-dimensional Mel frequency cepstrum coefficients(MFCC) along with their first and second difference are extracted from the speech signal to obtain a 39-dimensional feature vector. 51 output activations of the network were obtained according to the posterior probability of 51 isolate words and added to the original 39-dimensional MFCC to compose a90-dimensional acoustic feature vector. Then the thesis applies principal component analysis(PCA) to the 90-dimensional feature vector to obtain the first 40 principal components named Tandem feature for HMM-based speech recognition.Finally, Tibetan Speech recognition was realized by combining LSTM network andHMM. LSTM network is used as a Tibetan acoustic feature extractor, and the HMM is used for speech recognition. Experimental results show that the proposed method can achieve a recognition rate up to 80.56% on the test set.

Keywords/Search Tags:

Tibetan speech recognition, deep learning, Long Short Term Memory network, Hidden Markov Model, Tandem Feature

PDF Full Text Request

Related items

1	Research On Tibetan Lhasa Dialect Speech Recognition Based On Deep Learning
2	Research On Uyghur Speech Recognition Based On Deep Learning
3	Amdo Tibetan Speech Recognition Based On Deep Neural Network
4	Research And Application Of Speech Emotion Recognition Algorithm Based On Deep Learning
5	Speech Emotion Recognition Based On Deep Learning Technology
6	Research On Algorithms Of Speech Sentence Recognition Based On Deep Learning
7	Research On Connectionist Temporal Classification In Speech Recognition
8	Deep Learning For Spoken Term Detection
9	A Study On The Extraction Of Speech Depth In Tibetan Language And Its Speech Recognition
10	Research On The Conversion Of Bone Conduction Speech To Normal Speech Based On Deep Learning