Font Size: a A A

Research On And Implementation Of Continuous Speech Recognition System

Posted on:2017-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z R LuFull Text:PDF
GTID:2348330512464976Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Speech recognition technology,also known as Automatic Speech Recognition(ASR),enables a machine the ability to understand human language.After more than half a century of development,speech recognition technology becomes more mature.It has been applied to many fields such as voice dialing,voice document retrieval,voice chat assistant,simultaneous translation,smart home,medical services,industrial control,voice communication systems.There are a large number of speech recognition control systems,like Siri and Sogou speech aides,which can control voice and understand language.As the interaction interface,the speech recognition system plays an important role in this kind of system and determines the quality of this kind of application.Whatever how well-designed the voice assistant is,such application just a bubble without an excellent speech recognition system.Therefore,as a key technology to realize human-machine interaction freed,speech recognition is worthy of indepth study.In this thesis,we combined with hidden Markov theory,deep neural network theory,and HTK(HMM Tools Kit,which as a tool)to realize an IP voice dialing system.The work and contribution are summed up as follows:1.The research background,significance of speech recognition and its development status are summarized.The preprocessing of speech signal is introduced,and the key technologies involved in speech recognition are deeply studied.2.We first wrote a script to generate 25 random text,each of which contains 50 sentences in random IP format.We then assigned 25 people(12 men and 13 women)to record the corresponding text.Thus,we obtained a voice date base,which contains 1250 sentences.The 1000 sentences are used as the training samples(or the corpus),while the remaining250 sentences are used as the test samples.All the recording data are used in the WAV format which is the windows system commonly format.3.We used the 1000 training samples to build an IP voice dialing system on the HTK platform.Four types of hidden Markov model were trained,which are the mon-phone HMM,the tri-phone HMM,the bound state tri-phone HMM and the DNN-HMM.We then compared the four models in terms of the word and sentence recognition rate through experiments.We observed that the DNN-HMM model achived the highest word and sentence recognition rate and concluded that the performance of neural network model is better than the traditional HMM for speech recognition.However,the DNN-HMM model belongs to the depth model,whose complexity is very high.Compared with the other three models,the DNN-HMM model taked more time in training and decoding with the same voice data.Therefore,it has higher requirements for the hardware computing power.4.General IP address is composed of four fields(such as 210.52.207.2),and each segment can be represented by a maximum of 255.We designed a language model according to this characteristic of IP address.With strategy(considering the language model),the trained four models have greatly been improved in identifying the 250 sentence recognition test corpus.It shows that we can effectively improve the performance of speech recognition system use the appropriate language model.
Keywords/Search Tags:Speech recognition, Hidden Markov Model, Gaussian Mixture Model, Deep Neural Network, HTK
PDF Full Text Request
Related items