Yi Language Speech Recognition Using Deep Learning Methods

Posted on:2022-01-29

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Chen

Full Text:PDF

GTID:2518306500956389

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

Artificial intelligence is the weather vane leading the world’s future development of science and technology and lifestyle changes in the 21 st century.At present,speech recognition technology is one of the more successful technologies of artificial intelligence landing,which can convert speech into text to realize human-computer interaction.As the current speech recognition research mainly focuses on the mainstream language and some minority languages such as Tibetan and Donggan,the research on the speech recognition of minority Yi language is less.Therefore,this thesis carries out the research on the continuous speech recognition of Yi language based on deep learning methods.The main work and originality of this thesis are as follows:Firstly,the Yi language corpus and the Yi and Chinese Hybrid corpus are established.During the construction of Yi language corpus,5383 sentences reflecting the characteristics of Yi phonetic system are designed according to the pronunciation characteristics of Yi language.Each Yi word is marked with initial consonant,final vowel and tone.A female speaker is invited to record and save a wav file with 16 k Hz sampling rate and 16 bit sampling accuracy.The recorded sentences are segmented,proofread and detected,and the sentences that do not meet the recording quality are re recorded.The recorded Yi language corpus lasts about 5 hours.As for the construction of Yi and Chinese Hybrid corpus,by comparing the phonetic systems of Yi language and Chinese,and studying the pronunciation characteristics of Yi and Chinese,this thesis adds about 28 hours’ Tsinghua Chinese corpora from the open thchs30 on the basis of the original Yi language corpus.In the Yi and Chinese Hybrid corpus,the initials and finals of characters are unified with international phonetic symbols,and the tones of characters are marked with numbers.Secondly,we uses four acoustic modeling methods to realize Yi speech recognition,which are Hidden Markov Model(HMM),Deep Neural Networks(DNN),Time Delay Neural Network(TDNN)and End-to-End acoustic models,and carries out comparative experiments.The experimental results show that the word error rate of Yi speech recognition based on deep learning method is better than that of traditional HMM method.The word error rate of TDNN model is as low as 16.5%,while the End-to-End model has the worst recognition effect due to the lack of Yi language corpus,with the word error rate of 47.40%.Thirdly,we study the influence of three different acoustic features on the word error rate of Yi speech recognition.Based on the DNN model,this thesis implements Yi speech recognition with Mel frequency cepstrum coefficient(MFCC),bottleneck feature and bottleneck compound feature.The results show that the word error rate of Yi speech recognition based on MFCC feature is low.Considering that the lack of corpus may affect the experimental results,this thesis selects thchs30 Chinese corpus to assist modeling training for verification.The experimental results show that the error rate of compound feature recognition is the lowest,which is 15.01%.At last,the Yi continuous speech recognition based on transfer learning method is realized.The similarity between Yi language and Chinese language is compared by using Jacquard similarity coefficient.Thchs30 Chinese corpus is selected as the source domain of transfer learning,and then the DNN-HMM model’s transfer learning experiment of Yi language is carried out.The experimental results show that the word error rate of Yi speech recognition using transfer learning method is 3.16% lower than that of DNN model.

Keywords/Search Tags:

Speech Recognition, Yi Language, TDNN, Bottleneck Feature, Transfer Learning

PDF Full Text Request

Related items

1	Noise Robust Speech Recognition Based On CNN-TDNN And Transfer Learning
2	Research On Cross-language Tibetan Lhasa Speech Recognition Based On Transfer Learnin
3	Design And Implementation Of Deep Learning-based Open Speech System For Innovative Enterprises
4	Research On Emotion Recognition Of Monomodal Speech And Multimodal Speech Vision Based On Transfer Learning
5	Research On Tibetan Non-specific Continuous Speech Recognition Based On Deep Learning
6	Cross-language End-to-end Speech Recognition Research For Endangered Language
7	Research On Low-Resource Speech Recognition Based On The Transfer Learning And Fusion Of Language Models
8	Language Recognition System Based On Bottleneck Features
9	A Study On Low-resource Multilingual Speech Recognition Based On Transfer Learning
10	Research On Speech Emotion Recognition Methods Based On Deep Learning And Transfer Learning