Font Size: a A A

Adversarial Learning For Robust Speech Recognition

Posted on:2020-05-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:S N SunFull Text:PDF
GTID:1488306740472934Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the successful application of deep learning(DL)based speech recognition in real world scenarios,improving robustness of acoustic model is much crucial.A robust acoustic model should be insensitive to factors which are unrelated to speech recognition tasks,such as environmental noise,channel,speaker and accent and perform well even in real adverse environments.Recently,based on deep neural network(DNN),deep adversarial learning has shown huge potential on data generation,feature learning and model adaptation.In this thesis,we proposed to tackle robust acoustic modeling issue from adversarial learning prospective.Several adversarial learning techniques,including domain adversarial training(DAT),adversarial examples and adversarial regularization,were explored and improved given the special attribute of speech signal and models.The main contributions of this dissertation are as follows:(1)An unsupervised DAT based robust acoustic modeling strategy is proposed.On Aurora-4 data set,the proposed approach yields up to 37.8%relative word error rate decrease.Distribution mismatch between training and test data caused by noise and channel distortions is a common issue in robust acoustic modeling.In order to eliminate this kind of mismatch,unsupervised DAT was adopted during acoustic model training to learn domain-invariant features and improve model's performance on noisy test data.(2)Semi-supervised and supervised DAT is proposed.On 960-hour accent robust task,semi-supervised DAT can obtain 7.5%relative word error rate decrease.Totally unsupervised DAT can not utilize target domain data information sufficiently.In order to make full use of target domain information,we explored the effectiveness of DAT in supervised and semi-supervised scenarios for accent robust task.(3)An adversarial example based data augmentation approach for acoustic model training is proposed.Our proposed method can get 14.1%and 5.7%relative word error rate decrease on two corpora respectively.Fast gradient sign method(FGSM)is used to generate adversarial examples dynamically during acoustic model training,which can make the model insensitive to small local perturbations.Furthermore,teacher/student(T/S)learning is combined with our proposed augmentation approach.(4)Train end-to-end ASR model using loss function with adversarial regularization item.On 1000-hour open source Chinese Mandarin corpus AISHELL-2,our proposed method can obtain 12.2%relative character error rate decrease.End-toend ASR model suffers severely from unsmooth problem.A small perturbation from input could be easily propagated to the output level.In this thesis,we proposed to train end-to-end model using loss function with adversarial regularization item.FGSM and local distributional smoothness(LDS)techniques were improved for sequence-to-sequence model.(5)Explore adversarial examples and adversarial regularization for deep learning based KWS task and our proposed methods can obtain 31%to 45%false reject decrease when false alarm rate is 1.0%.False alarm(FA)and false reject(FR)are unavoidable and unrepeatable in KWS task.Adversarial example based data augmentation and regularization were applied to Deep KWS model and end-to-end KWS model.Several adversarial perturbation strategies were explored in this thesis and our proposed method can significantly reduce FA and FR rates.
Keywords/Search Tags:Robust speech recognition, Deep adversarial learning, Domain adversarial training, Adversarial example, End-to-end speech recognition
PDF Full Text Request
Related items