Font Size: a A A

Research On Deep Learning Speech Recognition Technology For IOT Terminal

Posted on:2022-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ZhouFull Text:PDF
GTID:2518306602964889Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the development of deep learning technology,speech recognition has shifted from the traditional hidden Markov model to the deep learning model.It is an inevitable trend to perform speech recognition in embedded devices of the Internet of Things terminal.Therefore,it is necessary to optimize and improve the design of the deep learning model according to the low performance characteristics of the embedded platform.The existing deep learning models are large in scale and have serious computing power requirements,so it is difficult to deploy them on the embedded side.Therefore,this paper performs compression,structural optimization and fixed-point operations on the basis of the existing speech recognition model,and transforms the speech recognition model into a deep learning model suitable for embedded platforms.At the same time,this paper conducts experimental tests based on the ARC IOT DK embedded platform,and optimizes the entire system using the ARC V2 DSP instruction set to further improve the overall performance of the system.The main work and contributions of this paper are as follows:First of all,this article determines the audio file preprocessing process,and uses the Mel frequency cepstral coefficient algorithm to extract the audio file features.The audio file after feature extraction will generate a voice feature vector,which will be used as the input of the deep learning network for calculation.Secondly,this article designs a speech recognition deep learning network suitable for embedded platforms based on the characteristics of low storage and low computing power of embedded platforms.While reducing the scale of the network,the network ensures the accuracy of speech recognition.The deep learning network designed in this paper is based on the Mobile Net model,its neural network is compressed and pruned,and the network is fixed-point converted,so that the number of parameters and the number of multiplication operations are significantly reduced.Subsequently,this paper adds the Long Short-Term Memory(LSTM)layer to the neural network,which increases the network's feature extraction ability on the time-series scale,improves the recognition rate of the entire network,and reduces The resulting loss of accuracy.Finally,this article compares the network designed by myself with the commonly used speech recognition network.The experimental results show that compared with the commonly used speech recognition network,the number of parameters is 1/10 of that of the commonly used network under the condition that the accuracy of the network is not changed much.And the recognition rate changes more stable in the presence of noise,and the anti-noise ability is stronger,which is more suitable for the application scenarios of the Internet of Things terminal.Finally,this article deploys the above-mentioned speech recognition network on the ARC IOT DK embedded platform,and uses the ARC V2 DSP instruction set to further optimize the system and conduct experimental tests.The experimental results show that the system optimized by DSP has a great improvement in recognition rate and response speed compared with the original system.The recognition rate of the system has almost reached the level of floating-point network,and the response speed is twice that before optimization.It can get results faster when performing speech recognition,and improve the interactive experience.
Keywords/Search Tags:Speech recognition model, Fixed point, Embedded platform, DSP instruction set, Audio preprocessing
PDF Full Text Request
Related items