Font Size: a A A

Research On Speech Keyword Spotting Technology Based On Deep Learning

Posted on:2020-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:K N ChenFull Text:PDF
GTID:2518305981455484Subject:Master of Agriculture
Abstract/Summary:PDF Full Text Request
Automatic speech recognition is a technique in which the machine automatically recognizes speech into text.Speech recognition technology has a wide range of applications in human-computer interaction,and speech keyword detection is a special way of speech recognition.As voice data increases,voice keyword searches become more and more important.The task of speech keyword detection is to determine if a given word or phrase appears in the speech segment,as well as where it appears.The typical speech keyword detection system is mainly composed of automatic speech recognition and information retrieval.With the development of speech recognition technology in recent years,deep neural networks have been successfully applied in the field of speech recognition.In this thesis,the speech keyword detection technology based on deep neural network is mainly studied.This paper first introduces the two-part system framework,which is divided into speech recognition and keyword search.The speech recognition framework includes a signal feature processing and decoding engine.The decoding engine includes an acoustic model,a pronunciation dictionary,and a language model for the training phase.In the keyword detection phase,a grid-based form is used,an index is constructed,and keywords are confirmed based on the confidence score.This paper first constructs a Chinese keyword detection system on the dataset Thchs30 using the open source kaldi toolbox and the F4 DE toolkit.Comparing the performance of DNN-HMM model with different activation functions and traditional GMM-HMM model system,the speech recognizer of deep neural network model is better than the traditional speech recognition model,which improves the performance of keyword detection search by about 6.5%..The basic KWS system is based on the word grid to calculate posterior scores and use them to make "yes/no" decisions.Based on the a posteriori probability of the word lattice,the main problem may occur.The hypothetical test has a low posterior score,so that the detector cannot detect it and treat it as a missed test.Thus,the goal of this paper is to enhance keyword decisions by detecting and increasing the scores of missed tests.This paper studies the fusion of DNN-HMM model and GMM-HMM model to further improve the performance of ASR.For the problem of missed detection of the recognizer,this paper studies a two-stage re-decision method.Based on the fusion of ASR,this paper obtains the candidate list of the first keyword detection,and then calculates the similarity score between the detection keyword decision and the no decision based on the multi-template matching method,and converts the similarity score into the non-decision list.The new posterior probability score,and finally the re-score normalization is used for the new threshold decision.In this paper,the keyword detection system is built on the data set AISEHLL1 using the kaldi tool and the F4 DE toolkit.The performance of the speech recognizer in the fusion model and the single model and the performance of the template matching score and the template matching score in the keyword detection are compared by experiments.The WER of the fusion model is lower than that of the single model and the ATWV is smaller than the single model.high.The experiment proves that the template matching similarity score under the fusion model can effectively improve the performance of the speech keyword detection system.
Keywords/Search Tags:Speech Recognition, Deep Neural Network, Spoken Term Detection, Fusion model, Template matching
PDF Full Text Request
Related items