Research On Tibetan Lhasa Dialect Speech Recognition Based On TANDEM Feature

Posted on:2019-08-09

Degree:Master

Type:Thesis

Country:China

Candidate:J X Wu

Full Text:PDF

GTID:2428330545981735

Subject:Electronic and communication engineering

Abstract/Summary:

Language is the most important bridge for human life and communication.Speech is the most important way to transmit information.Therefore,speech recognition is particularly important in human society.At present,the speech recognition in the major languages have achieved ideal results.The application of speech recognition has gradually entered people's lives with the Internet of Things and smart home systems.However,for some small patches of dialects and minority languages(such as Tibetan)is not only lacking in speech corpus resources,but also it is very difficult to obtain speech experimental data.To solve the problem of poor performance of Tibetan speech recognition system due to lack of data,a Tibetan speech recognition method based on TANDEM features under low-resource conditions was proposed.In order to improve the recognition of Tibetan languages,the thesis introduces the TANDEM feature which generated by Long Short Term Memory(LSTM)Network Models to Tibetan Lhasa dialect speech recognition.The main works and originalities of the thesis are as follows:Firstly,a LSTM model was constructed as a Tibetan acoustic feature extractor in the acoustic model level.The network includes an input layer,an output layer,and the last layer of the network is a post output layer.It can not only evaluate the objective function when propagating forward,but also can pass the error back to the output layer.In order to enable the network to model long term series and could make full use of the context information,the hidden layer is provided with a three-layers BLSTM.linear projection layer reduced the number of model parameters and improved the experimental training speed.Secondly,the Tibetan acoustic extractor based on the LSTM network is acquired with a more distinguishing feature of the Tandem In the acoustic feature level.TANDEM feature is to classify the frame of the training corpus with LSTM,the weights of the network are trained to obtain the phoneme level posterior probability by using the Back Propagation algorithm and the minimum cross entropy criterion Instead of using the output of the LSTM directly,the LSTM uses a relatively narrow layer of values as the acoustic signature.Then uses the traditional GMM-HMM training and decoding.Finally,A Tibetan Speech recognition was realized by combining TANDEM-LSTM network and HMM.LSTM network is used as a Tibetan acoustic feature extractor,and the HMM is used for speech recognition.Experimental results show that the proposed method can effectively improve system identification performance under low resource conditions,compared with the GMM-HMM baseline system,there is a 15% reduction in WER on the test set.

Keywords/Search Tags:

Tibetan speech recognition, TANDEM Feature, Long Short Term Memory network, Hidden Markov Model, deep learning

Related items

1	Research On Tibetan Lhasa Dialect Speech Recognition Based On Deep Learning
2	Amdo Tibetan Speech Recognition Based On Deep Neural Network
3	Research On Uyghur Speech Recognition Based On Deep Learning
4	Research And Application Of Speech Emotion Recognition Algorithm Based On Deep Learning
5	Speech Emotion Recognition Based On Deep Learning Technology
6	Research On Algorithms Of Speech Sentence Recognition Based On Deep Learning
7	Research On Connectionist Temporal Classification In Speech Recognition
8	A Study On The Extraction Of Speech Depth In Tibetan Language And Its Speech Recognition
9	Deep Learning For Spoken Term Detection
10	Research And Application Of The Short-term Memory Network For Adjusting Gate Length