Font Size: a A A

Arithmetic Research Of Voice Keyword Spotting With Low Memory And Low Latency

Posted on:2021-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:T ZouFull Text:PDF
GTID:2518306050967309Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Speech keyword spotting technology is a device wake-up technology that detects target keywords from input speech,which is mainly used in the wake-up module of smart speakers.In recent years,with the rapid development of mobile Internet,voice-based interaction technology has begun to be popularized,and as the front-end part of speech recognition,voice wake-up is directly related to the efficiency and effect of subsequent speech interaction.Especially,the core technology of voice wake-up is voice keyword detection technology,voice keyword detection technology through the identification of input voice to determine whether the detected voice is a designated command or target command.However,the voice interaction technology is closely related to the actual application scenario,and algorithms and hardware requirements are different in different application scenarios.Especially in the application of smart speaker,voice wake-up is only a tiny front-end module,there are often high requirements for memory size,the need to reduce the model's parameter size.In addition,because of the direct user-oriented,in order to achieve an efficient experience,there are also higher requirements for response speed,and the size of the model calculation is directly related to the response speed of predetection.Therefore,a low memory low latency voice keyword detection technology is very important.In order to solve the above problems,this paper introduces the concern mechanism on the basis of the convolution recurrent neural network.modifies the convolution algorithm,puts forward the convolution algorithm in the time and frequency dimension,so as to enhance the ability of model prediction,make better use of the time series and time spectrum space information of speech,and reduce the memory consumption and predictive delay rate of the model.In addition,this paper also explore the low memory consumption on the perspective of model compression.on the basis of the above-mentioned architecture,by using of singular value decomposition compression parameters,the experimental model can reduce the parameters by two-thirds without reducing the accuracy of the model,when compared with the traditional model.In the case of the same accuracy of 97% of the case,the model can reduce the model parameters of the entire system by about 20 KB,while the delay rate is reduced by about 0.5 millisecond.In the application,this paper reduces the scope of the secondary parameter representation,changes the 8-bit binary parameter to a lower number of bits(such as 6-bit binary),and can significantly reduce memory footprint with little impact on accuracy.It can be concluded that the algorithm used in this paper has a better application prospect under the application scenario of limited resources.
Keywords/Search Tags:Speech Keyword Spotting, Attention Mechanism, Singular Value Decomposition, Low Memory, Low Latency
PDF Full Text Request
Related items