Research On Efficient Utilization Method Of Training Samples For Speech Keyword Spotting

Posted on:2023-11-30

Degree:Master

Type:Thesis

Country:China

Candidate:X T Lan

Full Text:PDF

GTID:2558306827998899

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

Speech keyword spotting(KWS)is a critical technology of human-computer interaction.It pays attention to a small segment of the audio stream and is usually served as the intelligent startup interface of devices.Only when users send specific instructions or words the complex modules will be triggered and processed accordingly,which makes devices could run for a long time with a low-power standby state.In recent years,with the rapid development of deep learning,the performance of KWS systems has been dramatically improved.However,it still faces many challenges,such as data imbalance,low efficiency of sample utilization,slow training,and so on.This thesis studies these problems,and the specific contributions are as follows:1.Based on the number-of-errors guided re-weighted loss function to alleviate the impact of data imbalance.The problem of data imbalance is common in KWS training,where a large amount of diverse negative training samples that may have pronunciation similar to the keyword are indeed required to reduce false alarms.Simultaneously,it is easy to collect abundant negative training data,while it is expensive to collect positive keyword data.During training,a large number of easy-trained negative samples overwhelm the loss and dominate the gradient backpropagation,resulting in a degenerated model.To deal with it,this thesis proposes a novel re-weighted loss.It evaluates the sample importance by its number of detection errors during training and automatically down-weights the contribution of easy examples,the majorities of which are negatives,making the training focus on samples deserving more training.The proposed method can alleviate the imbalance naturally while efficiently using all available data.Evaluation of several sets of keywords selected from AISHELL-1 and AISHELL-2achieves 16%—38% relative reductions in false rejection rates over standard loss at 0.5 false alarms per keyword per hour in experiments.2.Proposes a sample utilization strategy based on class uncertainty to improve deep learning efficiency.In the conventional training paradigm of deep learning,all data point contributes equally regardless of the underlying distribution,and each participates in the training throughout.It is not an efficient process since the “learning difficulty” of training samples varies for the model in different training stages.A large number of easy examples that can be correctly identified by the model without much effort participate in the training without restriction,resulting in a waste of computing resources and reducing the training efficiency.Similar to the “Three Zones” theory of human cognition: “Only by choosing activities in the learning zone can one make progress,” this thesis proposes an effective learning mechanism that focuses on the use of “Learning Zone” samples during training to improve the efficiency of training.By constructing the probability of samples utilization according to the output information of the model in a feedback manner,the proposed method focuses on the untrained samples that are close to the decision boundary,and removes a large number of samples that have relatively no learning significance for the current model in the middle and late stages of training,that is,it uses subset for training,which improves the pertinence.Several KWS experiments on Google Speech Command dataset show that the proposed method reduce the training time of the original approach by 59.47%—64.86%,while the accuracy only decreases by about 1%.The experiment of image classification task based on CIFAR-10 further verifies the effectiveness.When the accuracy is only relatively reduced by 0.85%,the training time is reduced by 65.07%.

Keywords/Search Tags:

sample utilization strategy, speech keyword spotting, data imbalance, sample importance re-weighting, training efficiency

PDF Full Text Request

Related items

1	The Extension And Selection Of Training Samples For Speech Keyword Recognition And The Implementation Of The System
2	Research On Human Computer Interaction Based On Speech Keyword Spotting
3	Study On Speech Keyword Spotting Methods Based On Deep Learning
4	Co-training Method Research Based On Sample Selection Strategy
5	Keyword spotting in continuous speech utterances
6	The Mandarin Continuous Speech Keyword Spotting System Medium Vocabulary
7	Research On Speech Keyword Spotting Technology For Mongolian
8	Research On Aduio Keyword Spotting Technology Based On Neural Network
9	Research On Keyword Spotting Technology Of Chinese Speech Recognition System
10	Research On Speech Keyword Spotting Technology Based On Deep Learning