With the advant of smart home,voice assistant and other products advant in recent years,intelligent speech recognition has truly realized the transformation from scientific research institutions to market applications.Intelligent voice interaction changes people’s lifestyle,and command word recognition technology has also received extensive attention as a means of device interaction and voice control.Command word recognition is mainly applied to some small power consumption devices with little computing power,such as voice assistants,speech navigation,wearable devices,etc.The indexes to evaluate the performance of a command word recognition system include wake-up rate,false alarm rate,real-time rate,user experience and power consumption level,which are all the difficulties in system design.Therefore,how to improve the robustness of command word recognition in complex environment becomes a challenge for researchers at this stage.In view of the above situation,this paper studies the command word recognition method to improve real-time response and system robustness as well as to reduce power consumption.The main achievements of this research are concluded as follows:1.An endpoint detection algorithm in a low SNR environment was proposed.Firstly,the algorithm suppressed non-stationary noise and used modulation domain spectral subtraction to eliminate residual noise,so as to improve signal-to-noise ratio and reduce speech distortion.Then,the power normalized cepstrum coefficients of each frame signal was extracted.By calculating the power normalized cepstrum distance,a kind of robust endpoint detection parameter was obtained.Finally,the double threshold method was used to perform endpoint detection using this parameter.The experimental results show that,firstly,by suppressing non-stationary noise support,modulation domain spectrum subtraction is then used to eliminate residual noise.The endpoint detection algorithm can be used to filter non-speech signals,improve the real-time response performance of command word recognition system in complex noise environment,reduce power consumption,and has certain practical value.2.A command word recognition system based on Array of dual microphones and deep residual network is studied.The improved residual model Res Nets15 is adopted to build the command word recognition system,and the performance of receptive field enhancement model is improved by dilated convolution.The system uses dual microphone array data set,and extracts power normalized cepstrum distance as characteristic parameters and inputs them into residual network for training.After training,the recognition accuracy of command words reaches more than 95%.At the same time,a deeper model resnets50 is added and its accuracy,model memory occupation and power consumption can meet the requirements of deploying mobile devices.The multi-task system is especially suitable for voice control equipment for the disabled.It can focus on user instructions and reduce external speaker interference,thus realizing high-precision command word recognition. |