Font Size: a A A

Development Of Offline Speech Recognition System Based On Deep Learning

Posted on:2021-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z H ZangFull Text:PDF
GTID:2428330620976928Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Speech and text play an irreplaceable role in human communication.Speech is the most natural way of human communication.Text is more convenient in storage,reading and humancomputer interaction.Converting speech into text has been a popular research focus.Chinese is the language with the largest number of people in the world.Chinese speech recognition needs to consider not only a large number of synonyms and homonyms,but also the inaccurate pronunciation caused by vowels and tones.The recognition is complicated and difficult.Based on the deep learning method,this research investigates the speech recognition system to establish a reliable and accurate speech-to-text conversion model,which mainly includes the following three aspects:(1)By analyzing Chinese speech data preprocessing and feature extraction methods,the research confirms that using Hamming window can provide a higher quality spectrum for subsequent feature extraction;by comparing the features of the spectrogram,the filter bank,the detailed information of the MFCC features,and the actual modeling,the research finds that the recognition accuracy becomes higher by using the spectrogram features as the input of the acoustic model.(2)Aiming at the problems of training complexity of traditional speech recognition systems,time-consuming,laborious data labeling,and low accuracy,we combined deep learning with CTC algorithm to build acoustic models.The use of convolutional neural networks to build models effectively speeds up training and reduces spatial parameters.Using the CTC algorithm for likelihood optimization avoids the problem of data labeling and reduces the complexity of model training.By using batch normalization,residual modules and other optimization methods,the accuracy of the acoustic model is improved;the fine-tuning of the model further improves the accuracy of the single-person model.(3)Aiming at the problems of the end-to-end speech recognition system requires too much data,this article uses a non-complete end-to-end framework.The acoustic model realizes the conversion of speech to Pinyin,and the language model realizes the conversion of Pinyin to text,sacrificing a small portion of decoding speed based on improved accuracy.Using neural network language models instead of mainstream statistical language models solves the problems of too large spatial parameters and too sparse data and improves the accuracy of Pinyin to Chinese conversion.This research builds an offline speech recognition system based on open-source Chinese speech data sets.It focuses on the construction and prediction of acoustic models by combining convolutional neural networks and CTC algorithms,using batch normalization,residual connection modules,and other optimization strategies to effectively reduce the errors.As an result,the error rate of speech to Pinyin conversion is about 15% in the test set.Using the neural network language model as a Pinyin to text conversion model greatly reduces the accuracy loss,making the final recognition accuracy rate around 84%.The research finally constructs an offline voice recognition system software platform,using the server-client interaction mode,which is convenient for the collection of voice data sets and the update of the underlying model,which greatly improves the user experience.
Keywords/Search Tags:Speech Recognition, Deep Learning, Acoustic Model, Language model, Offline Software Platform
PDF Full Text Request
Related items