Spoken Keyword Spotting Method And System Design Based On CRNN-CTC

Posted on:2022-01-09

Degree:Master

Type:Thesis

Country:China

Candidate:H K Yan

Full Text:PDF

GTID:2518306569972659

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

With the rapid development of deep learning,the performance of spoken keyword spotting(KWS)has been greatly improved.However,due to the reasons such as the high complexity of the language itself and the lack of annotated corpus,many minority languages such as Hakka dialect have not been fully studied on KWS.There are fewer speech intelligence applications in these languages.This thesis carries out the research on the KWS of the Hakka dialect in Ganzhou area of Jiangxi Province.A KWS method using Convolutional Recurrent Neural Network(CRNN)with Connectionist Temporal Classification(CTC)is proposed in this thesis.First,the effectiveness of this method is verified on Mandarin.And then it is applied to the Hakka dialect.Finally,a KWS system is built.The main work of this thesis is as follows:1.A CRNN-CTC based KWS method is proposed,which combines Convolutional Neural Network(CNN)and RNN-CTC.Experiments on the AISHELL-2 Mandarin public corpus show that the proposed CRNN-CTC method on the tasks of 12 keywords and 20 keywords can achieve a false reject rate of 4.82% and 5.38% at 0.5 false alarm per keyword per hour,respectively,which is a relative decrease of 38.83% and 58.81% compared with the RNN-CTC method.In addition,the training time is also shorter.2.Aiming at the pronunciation characteristics of Mandarin,a systematic comparison of the differences among different modeling units is carried out,which include Chinese characters,tonal syllables,words,initials,and tonal finals.The experimental results show that using all initials and tonal finals as the modeling unit has the best performance on Mandarin KWS.3.A Hakka speech corpus of about 447 hours is collected in this thesis.Then the CRNN-CTC based method is extended to Hakka.Considering the language characteristics of Hakka dialect,whether Hakka and Mandarin are consistent in the selection of the optimal modeling unit is explored.And the reasons for the performance gap between Hakka and Mandarin are analyzed in detail.The experimental results on the Hakka corpus collected in this thesis show that,among the above four modeling units,the Hakka dialect has the best performance when using tonal syllables,which is different from Mandarin.The proposed method on the Hakka KWS tasks of 12 keywords and 50 keywords can achieve a false reject rate of 12.43% and 11.88% at 0.5 false alarm per keyword per hour,respectively.4.In order to improve the purity of the Hakka speech corpus,based on the CRNN-CTC KWS method proposed in this thesis,a speech sample reliability evaluation indictator based on weighted edit distance is designed.The indicator is weighted according to the number of false alarms and false rejects of keywords,and combines the model decoding outputs during different training epochs.Finally,it is used to screen out the samples most likely to have noisy label.5.A KWS system that can process multiple speech concurrently is built.The system opens an API request interface.There are two working modes,one is offline non-real-time keyword spotting,and the other is online real-time keyword spotting.After testing,the system can still process about 60 seconds of speech data per second on a general computer without GPU.

Keywords/Search Tags:

spoken keyword spotting, connectionist temporal classification, Hakka, modeling units, speech samples screening

PDF Full Text Request

Related items

1	Research On Speech Keyword Spotting Technology Based On Deep Learning
2	Research On Human Computer Interaction Based On Speech Keyword Spotting
3	Research On Connectionist Temporal Classification In Speech Recognition
4	Whisper speech processing: Analysis, modeling, and detection with applications to keyword spotting
5	Keyword spotting in continuous speech utterances
6	Research On Speech Keyword Spotting Technology Based On Deep Learning
7	The Mandarin Continuous Speech Keyword Spotting System Medium Vocabulary
8	Research On Speech Keyword Spotting Technology For Mongolian
9	Research On Keyword Spotting Technology Of Chinese Speech Recognition System
10	Rapid Keyword Spotting In Continuous Speech