Font Size: a A A

Luo Ping Dialect Speech Recognition Research Based On Kaldi

Posted on:2019-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhangFull Text:PDF
GTID:2428330548468878Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Speech recognition is the main method of human-computer interaction.In recent years,with the development of science and technology,speech recognition began to be applied to all aspects of life.However,the current Chinese speech recognition systems are based on Putonghua.There are many dialects in such a broad and ethnically diverse country as China.The speech recognition system based on Putonghua is far from meeting the needs of the public,so the research and applications of regional dialects are particularly necessary.This thesis briefly introduces the development history of speech recognition,expounds the basic principle of speech recognition technology,analyzes the significance of each technology of speech recognition to the development of speech recognition.The related technologies involved in the whole process from the collection of original analog speech signals to the construction of language models and acoustic models are studied.In the following discussion,we focused on the acoustic model of speech recognition.the acoustic models studied in this thesis include:Mono-phone model,Triphone model,Optimized Triphone model,Hidden markov model(HMM)and Deep neural network model(DNN).The language model of speech recognition is also studied in this thesis,we mainly study the N-gram model based on statistics.Finally,this thesis analyzes the characteristics of Luo ping dialect and builds a speech recognition system based on Kaldi.In this thesis,five sets of comparative experiments were set up,and the accurate performance of the system was compared between different acoustic models,different language models and different training samples.The experimental results show that in the six different acoustic modelsthe accuracy rate of the acoustic model based on DNN is the highest,up to 96.82%,and the experimental result of the bigram model was better than the unigram model.In the experiment of binary grammar model,with the increase of the training data sample from 1980 to 2420,the accuracy of the system identification was improved continuously,which indicates that the larger the training sample data is,the higher the accuracy of the system identification.On this basis,the training samples and tests were adjusted.The results show that the system has good self-adaptability.
Keywords/Search Tags:Luo ping dialect, DNN, Kaldi, Speech recognition
PDF Full Text Request
Related items