Font Size: a A A

Deep Learning As A Speaker Spoofing Countermeasure

Posted on:2017-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:Heinrich DinkelFull Text:PDF
GTID:2518305897976839Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recent development of algorithms and hardware for speech applications has enabled researchers to introduce deep neural networks(DNN)as a state-of-the-art machine learning framework,drifting apart from traditional Hidden Markov Model(HMM)and Gaussian Mixture Models based models.Compared to the traditional models,DNNs can be deployed in nearly any task.Despite the huge success in image and speech recognition tasks,the full potential within the speaker identification community has not been explored yet.While the speaker identification community focuses on building potent speaker discriminative systems,a new threat to speaker identification was recently described: attacks in form of spoofed utterances.Spoofed utterances can severely harm a preexisting speaker identification system to the point at which unauthorized users can trespass into the system.Initial research,which focuses on combating this thread concluded into two spoofing challenges: ASVSpoof2015 and BTAS2016.These challenges are the first to provide sufficient data to investigate methods,which can prevent these spoofing threads.This thesis focuses on the construction of deep neural network based spoofing detectors.Although neural networks have already been used in this field,they mostly act as feature extractions,marginalizing their classification potential.In this work,we revise the deep feature extraction framework and propose two approaches to the spoofing problem: An End-to-end system,capable of directly accepting or rejecting an utterance from raw waveforms and a small scale convolutional network,outperforming any preexisting neural network approach by large,while using significantly less parameters and training time.The proposed end-to-end CLDNN model outperforms the best current result on the BTAS2016 corpus of 1.21 % half total error rate(HTER)to 0.82 % HTER.In addition the small-footprint DCNN model boosts the previously best neural network based model's result of 1.1 % down to 0.7 % equal error rate(EER)on the ASVSpoof2015 corpus.
Keywords/Search Tags:Deep Neural Networks, Speaker spoof detection, CLDNN, CNN
PDF Full Text Request
Related items