Font Size: a A A

Study On Keyword Recognition Based On Neural Network

Posted on:2022-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ZhangFull Text:PDF
GTID:2518306509493074Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Keyword recognition aims to detect predefined keywords in audio streams,and it has a wide range of applications in fields such as smart terminals,service robots,and human-computer interaction.In recent years,people have used neural networks for keyword recognition to further improve the recognition effect,but they usually have hundreds of thousands of parameters,which is difficult to apply in embedded systems.In order to compress the network model,it can be achieved by techniques such as pruning and quantization,but it is often at the expense of recognition accuracy.It can be seen that,despite the success of the deep neural network method,it is still challenging to achieve high-precision keyword recognition tasks with fewer parameters.Therefore,this thesis studies a robust keyword recognition system.The main work is as follows:(1)A keyword recognition method combining Convolutional Block Attention Module(CBAM)and decomposed convolution kernel is proposed.This method decomposes the K×K convolution kernel in the standard two-dimensional convolution layer into the K×1 filter of the first CNN layer and the 1×K filter of the second CNN layer,and stack them into a residual structure to make it easy to train,and add CBAM to re-weight the output features of the convolutional layer to improve the recognition rate.This thesis compares this method with other methods on Google Speech Command Dataset and Chinese Command Dataset.The experimental results show that this method has high recognition performance in a variety of experimental environments.(2)A keyword recognition method based on one-dimensional depth separable convolution,grouped convolution and hole convolution is proposed.In this method,the main module is composed of a one-dimensional depth separable convolution,a grouped convolution(dot product)with a convolution kernel of 1,Batch Norm,and Re LU,and stack them into a residual structure,in which a dilation rate is injected into each convolutional layer to increase the receptive field.Based on the above network structure,this thesis constructs extended models with different complexity,and evaluates them on the Google Speech Command Dataset and the Chinese Command Dataset.The experimental results show that the method has fewer parameters and a simple network structure.Even if the 99 K trainable parameters are used,the recognition accuracy is increased by 0.42% compared with the Res Net15 model.
Keywords/Search Tags:Keyword Spotting, Convolutional neural convolution, Convolutional Block Attention Module, Depth separable convolution, Group convolution, Deep learning
PDF Full Text Request
Related items