Font Size: a A A

Research On Continuous Speech Recognition Based On Deep Learning

Posted on:2021-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:D F ShenFull Text:PDF
GTID:2518306512987369Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Since the 21 st century,with the rapid development of computer technology and artificial intelligence,the communication between human and machine is no longer limited to the input and output of text symbols.With speech recognition technology,machines can easily understand what people say,and even talk to people smoothly.Therefore,the research on speech recognition technology,especially continuous speech recognition technology,has become a hot spot.This paper constructs a continuous speech recognition system through the realization of three modules: the auto-segmentation of continuous speech,acoustic model and language model.The main tasks are as follows:(1)Research on the auto-segmentation continuous speech.This paper analyzes the features of speech signal and selects the appropriate features of time domain,frequency domain and cepstral domain as the basis for segmentation.First,the sound segments in the continuous speech are found by endpoint detection.Then we can find out the voiced segments in the sound segments by pitch period trajectory detection.The voiced segments can be subtracted from the sound segments to get the consonant segments,and the consonant is the mark of the beginning of syllable.Finally,because the energy of the different frequency bands of speech signal is different,this paper divides the spectrogram into 5 frequency bands and count the energy changes to achieve the segmentation of continuous vowel syllable and the segmentation of complex vowel syllable and consonant syllable.The experimental results show that this method has a better segmentation effect.(2)In this paper,an acoustic model based on Hidden Markov model and an acoustic model based on deep learning are constructed.The 24-dimensional Mel-frequency cepstral coefficients of speech signal are extracted for training,and the same speech database is used for testing.Then we compare the recognition accuracy and performance of several acoustic models.The experiments show that the acoustic model based on the bidirectional short-term memory model has achieved a high recognition rate.(3)This paper constructs a language model based on N-grams,realizes the syllable-tocharacter conversion,and analyzes the advantages and disadvantages of the model.At the same time,in order to improve the fault tolerance of the entire speech recognition system,other application experiments of the language model are carried out,and good results have been achieved in text filling and text error correction.
Keywords/Search Tags:Speech recognition, Speech segmentation, Deep learning, Acoustic model, BLSTM, Language model
PDF Full Text Request
Related items