Research On Uyghur Speech Recognition Based On Deep Learning

Posted on:2017-04-24

Degree:Master

Type:Thesis

Country:China

Candidate:P F Li

Full Text:PDF

GTID:2308330485463744

Subject:Communication and Information System

Abstract/Summary:

Speech recognition information is identified as the original speech signal corresponding text or other form that can be processed by a computer. Speech recognition technology is an important field of artificial intelligence research, and it has a high commercial price and research value. In recent years, with the rise of deep learning in the field of machine learning and pattern recognition, and deep learningâ€™s superior modeling capabilities, it could "learn" effective information from vast amounts of data, and this attracts numerous domestic and foreign researchersâ€™ concerns. Depth study has also been applied to the field of speech recognition, and achieved good results, Hidden Markov Models(HMM) which based on the depth neural network(DNN) is rapidly replacing the HMM which based on Gaussian Mixture Model, becoming speech recognition systemâ€™s standard configuration of nowadays.Conventional voice recognition languages are concerned primarily those which is using with a large population or high penetration of language, such as Chinese, English, Arabic and so on. Technological achievements which developed in the research process can be directly generalized to more languages that have similar characteristics that neednâ€™t much change. After decades of development, the speech recognition technology has matured in these languages. However, some minority languages, such as Uyghur, their speech recognition technologies in foreign countries havenâ€™t got widespread attention and development. With the rapid economic development in Xinjiang province in China, Xinjiang has become increasingly liberal about the Uyghur speech recognition system development and the need for a very broad market prospects which can not be ignored. In this thesis, we make a detailed analysis to the network model which based on the depth of learning and modeling method, meanwhile, we apply the speech recognition technology which based on the depth of learning to the identification of the Uyghur.First, the study of Acoustic Model which is based on DNN-HMM.Acoustic Model(which is short for AM) is the most important part of speech recognition systems, a good AM could improve the system performance of the speech recognition. This thesis, firstly, introduces the network structure and algorithm of DNN, then trains the acoustic models that based on DNN-HMM with two kinds of Uyghur speech data--300 hours & 500 hours. Through the experiment, we find the acoustic model which was trained by 300 hours is relatively decline 3.03% than the acoustic model which was trained by 500 hours in the word recognition error rate. This means the larger the training set, the higher the recognition rate of the acoustic model.Second, the study of Acoustic Model which is based on LSTM-HMM.We introduce the Recurrent Neural Network(which is short for RNN), but for the reason that RNN has Gradient disappearing problems, the Long-Short Term Memory(which is short for LSTM) is discussed. Through the experiment, we find the Acoustic Model of LSTM-HMM is relatively decline 12.49% than the Acoustic Model of DNN-HMM in the word recognition error rate. This means the Acoustic Model of LSTM-HMM has great improvement than the Acoustic Model of DNN-HMM in the performance.Third, the study of Uyghur Language Model optimization.Since the adhesion of Uyghur language, large vocabularies of agglutinative language will bring many issues to the data of traditional whole-word language model, such as sparseness and robustness are not strong, etc. By modeling sub-word, we optimized the Uyghur language model then decreased these issues. As a result, the word recognition error rate relatively decreased by 2.4%.

Keywords/Search Tags:

Speech recognition, Deep learning, Acoustic Model, Deep Neural Network, Long-Short Term Memory Network, Language Model

Related items

1	Research On Mandarin Speech Recognition Technology Based On Deep Neural Network
2	Research On Algorithms Of Speech Sentence Recognition Based On Deep Learning
3	Speech Emotion Recognition Based On Deep Learning Technology
4	Research On Tibetan Lhasa Dialect Speech Recognition Based On Deep Learning
5	Research On Lip Language Recognition Based On Deep Learning
6	Amdo Tibetan Speech Recognition Based On Deep Neural Network
7	Research On Sign Language Recognition Method Based On Deep Learning Algorithms
8	Non-specific Human Sign Language Recognition Based On Deep Learning
9	Deep Learning For Spoken Term Detection
10	Research And Application Of Speech Emotion Recognition Algorithm Based On Deep Learning