Font Size: a A A

Design And Implementation Of Embedded Speech Recognition System For Noise Environment

Posted on:2022-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:J XuFull Text:PDF
GTID:2518306764976319Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
As the most widely studied technology in the field of machine hearing,automatic speech recognition has very important application in smart home,smart car,virtual assistant for smart phone and other scenarios.Under ideal circumstances,the accuracy of automatic speech recognition is close to saturation.However,in noise scenes,especially in high noise scenes,the accuracy of speech recognition is greatly reduced or even completely lost due to the influence of steady and unsteady noises.At the same time,in some specific military application scenarios with strict energy usage requirements,such as aerospace satellites,all-weather land reconnaissance vehicles,etc.,it is imperative to design an embedded speech recognition mechanism with low energy comsumption.Therefore,this thesis focuses on the noise-oriented embedded speech recognition scheme for a specific noise environment,and designs and implements a small embedded speech recognition system.The main innovations and workload of this thesis can be summarized as follows:1.Considering the influence of steady-state noise and non-steady-state noise onspeech recognition,a real-time speech noise reduction front-end is designed andimplemented as a pre-processing front-end for speech recognition.Consideringthe hardware resource limitation of the embedded platform,a lightweight speechrecognition model TDNN-13-SVD-M-Denoise-Trigram is designed based onDNN-HMM framework.Experiments are carried out in combination with thespeech denoising front-end.The AISHELL test set is tested in a specific strongnoise environment with a SIGNAL-NOISE RATIO(SNR)of [-10,2].The(CERCharacter Error Rate)of speech recognition is 6.81%.Among them,the speech noise reduction front-end brings 0.3% to 0.5% reduction of CER.2.Considering the performance of speech recognition inference,the TDNN network acceleration engine is designed and implemented based on the embedded heterogeneous computing environment(domestic FT-2000 A CPU and domestic 690 T FPGA),and its inference performance is 3.07 times faster than the Open BLAS library when tested on the TDNN-13-SVD-M-Denoise model.3.Considering the requirements of practical applications,the embedded speech recognition API based on C++ is optimized and implemented,which provides both speech recognition function and non-fixed keyword retrieval function.The DNN network of acoustic model can be accelerated by Open BLAS and TDNN acceleration engine,which is configured by CMake Tool.Finally,6-8s speech data is used to test the inference interface of the API.The speech recognition time accelerated by Open BLAS is about 1.2s,the speech recognition time accelerated by TDNN engine is about 750 ms.At the same time,the CER of the API's speech recognition interface is 7.13%.4.An off-line,near-field embedded speech recognition system is designed and implemented based on API,which can filter silent audio and pure background noise in the recording,and can segment sentences according to the pauses in the speech.The operating power consumption of the system is only 5-6W,and it can support long-term use after being equipped with a battery.To sum up,the embedded speech recognition system designed in this thesis has the characteristics of anti-noise,real-time and low power consumption.
Keywords/Search Tags:Embedded Speech Recognition, Noise Reduction of Speech, Lightweight Network, Field Programmable Gate Array(FPGA)
PDF Full Text Request
Related items