Research On Speech Keyword Spotting Technology Based On Deep Learning

Posted on:2020-06-10

Degree:Master

Type:Thesis

Country:China

Candidate:K N Chen

Full Text:PDF

GTID:2518305981455484

Subject:Master of Agriculture

Abstract/Summary:

PDF Full Text Request

Automatic speech recognition is a technique in which the machine automatically recognizes speech into text.Speech recognition technology has a wide range of applications in human-computer interaction,and speech keyword detection is a special way of speech recognition.As voice data increases,voice keyword searches become more and more important.The task of speech keyword detection is to determine if a given word or phrase appears in the speech segment,as well as where it appears.The typical speech keyword detection system is mainly composed of automatic speech recognition and information retrieval.With the development of speech recognition technology in recent years,deep neural networks have been successfully applied in the field of speech recognition.In this thesis,the speech keyword detection technology based on deep neural network is mainly studied.This paper first introduces the two-part system framework,which is divided into speech recognition and keyword search.The speech recognition framework includes a signal feature processing and decoding engine.The decoding engine includes an acoustic model,a pronunciation dictionary,and a language model for the training phase.In the keyword detection phase,a grid-based form is used,an index is constructed,and keywords are confirmed based on the confidence score.This paper first constructs a Chinese keyword detection system on the dataset Thchs30 using the open source kaldi toolbox and the F4 DE toolkit.Comparing the performance of DNN-HMM model with different activation functions and traditional GMM-HMM model system,the speech recognizer of deep neural network model is better than the traditional speech recognition model,which improves the performance of keyword detection search by about 6.5%..The basic KWS system is based on the word grid to calculate posterior scores and use them to make "yes/no" decisions.Based on the a posteriori probability of the word lattice,the main problem may occur.The hypothetical test has a low posterior score,so that the detector cannot detect it and treat it as a missed test.Thus,the goal of this paper is to enhance keyword decisions by detecting and increasing the scores of missed tests.This paper studies the fusion of DNN-HMM model and GMM-HMM model to further improve the performance of ASR.For the problem of missed detection of the recognizer,this paper studies a two-stage re-decision method.Based on the fusion of ASR,this paper obtains the candidate list of the first keyword detection,and then calculates the similarity score between the detection keyword decision and the no decision based on the multi-template matching method,and converts the similarity score into the non-decision list.The new posterior probability score,and finally the re-score normalization is used for the new threshold decision.In this paper,the keyword detection system is built on the data set AISEHLL1 using the kaldi tool and the F4 DE toolkit.The performance of the speech recognizer in the fusion model and the single model and the performance of the template matching score and the template matching score in the keyword detection are compared by experiments.The WER of the fusion model is lower than that of the single model and the ATWV is smaller than the single model.high.The experiment proves that the template matching similarity score under the fusion model can effectively improve the performance of the speech keyword detection system.

Keywords/Search Tags:

Speech Recognition, Deep Neural Network, Spoken Term Detection, Fusion model, Template matching

PDF Full Text Request

Related items

1	Deep Learning For Spoken Term Detection
2	Research On Out-of-vocabulary Spoken Term Detection
3	Research On Spoken Term Detection Based On ASR Under Limited-resource Conditions
4	Research On Spoken Term Detection Technology In Continuous Speech Based On Sample Template
5	Research On Chinese Spoken Term Detection Based On Deep Learning
6	A Study Of Key Problems In Spoken Term Detection
7	Research On Fast Query-by-example Spoken Term Detection Based On Template Matching
8	Research On Uyghur Speech Recognition Based On Deep Learning
9	Research On WFST Based Spoken Term Detection
10	Research On Mandarin Speech Recognition Technology Based On Deep Neural Network