Font Size: a A A

Design Of Embedded Continuous Speech Recognition System Based On Deep Learning

Posted on:2021-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:P F WuFull Text:PDF
GTID:2518306569497934Subject:IC Engineering
Abstract/Summary:PDF Full Text Request
As an important part of modern human-computer interaction,continuous speech recognition system is gradually replacing traditional interaction methods such as mouse and keyboard,and has a wide range of application requirements in various fields.Because the continuous speech recognition system has the characteristics of complex models and large demand for computing resources,most of the current recognition methods still use cloud-based recognition methods,which makes it difficult to guarantee user privacy and practical application scenarios are greatly restricted.In order to solve this problem,this paper adopts the deep learning method to establish an acoustic model locally,and transplant the model to the embedded development board to complete the algorithm implementation on the terminal hardware platform.For the speech signal processing part,this paper adopts Mel filter bank characteristics as the input characteristics of the speech recognition system.The thesis analyzes the processing process of the speech signal in detail,and processes the speech data set.At the same time,the Mel frequency cepstrum coefficients of the speech signal are compared with Mel filter bank characteristics.Experiments show that in a speech recognition system using a deep learning framework,the Mel filter bank features can better represent speech information and reduce the amount of calculation of the speech signal processing module.For the network construction and algorithm training part of the speech recognition system,this article uses a combined network of long and short-term memory networks and deep neural networks to build network modules.The process of speech recognition is the process of modeling time series.In dealing with time series,the cyclic neural network has a "memory" function,which makes the recognition performance of the model better.This paper analyzes the modeling principles of traditional speech recognition algorithms,deep neural networks,recurrent neural networks and other network frameworks,and proves through experiments that long and short-term memory networks have high recognition accuracy in dealing with speech recognition problems.When traditional speech recognition algorithms process the relationship between speech and tags,they not only need to obtain the text content corresponding to a piece of audio,but also need to obtain the tag information corresponding to each frame in the time series.The voice alignment requires repeated iterations to ensure accurate alignment.For an isolated word recognition system with a short pronunciation time and a small number of pronunciation words,it can be estimated and labeled according to the short-term stationarity of the speech.For continuous speech recognition systems,due to the long pronunciation time of audio data and the large number of pronunciation words,it is estimated that labeling is no longer applicable.This paper proposes a method of Conectionist Temporal Classification(CTC),which directly corresponds the feature vector of the audio to the final identification label,lists the label path that each label sequence may correspond to,and uses the speech that has not been labeled for each frame As a training set.The CTC training method introduces a new loss function,which can make the long and short-term memory network directly learn the audio sequence data,which improves the recognition efficiency of the system.In addition,when testing the accuracy of the model,this article uses the CTC decoding method to select the first three large values in the tag probability distribution map of each frame for beam search,and the decoding result is obtained after deduplication.Experiments show that a speech recognition method that combines CTC's end-to-end recognition method with deep learning technology can improve recognition efficiency and accuracy.In order to verify the performance of the speech recognition system in actual scenarios,this paper transplants the speech recognition system designed on the PC side to the embedded development board,and conducts experimental analysis and testing.The hardware platform selected in this article is the IMX6 U development board of Punctual Atoms,and the algorithm is optimized according to the hardware resources of the development board to complete the deployment and verification of the speech recognition system in the development board.Experiments show that the speech recognition system based on deep learning is suitable for transplanting to the terminal hardware platform,and can effectively improve the speech recognition rate and achieve the expected effect.
Keywords/Search Tags:speech recognition, deep learning, LSTM, embedded
PDF Full Text Request
Related items