Font Size: a A A

The Detection Of Voice Replay Attack Based On Deep Learning

Posted on:2020-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ZhaoFull Text:PDF
GTID:2428330575993761Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
The use of recaptured speech to impersonate other people's identity can pose a serious threat to social security.it has been demonstrated by the reported efforts that the existing automatic speaker recognition(ASR)systems are fragile to this kind of attack.Therefore,the research on the security issues of replay attack is of great importance.However,the researches on the related problems are still insufficient and of significant restriction because they focus on the detection of recaptured speech by using traditional signal processing methods of which the feature extraction algorithms are complicated and unstable.Therefore,in this thesis we study the detection of voice replay attack in the intelligently data driven frame work of deep learning to recognize replayed speech from genuine speech.We propose detection algorithms based on convolutional neural network and on recurrent neural network respectively.The main contributions are as follows.1.To propose a detection algorithms based on convolutional neural network.The proposed network architecture is specifically designed based on spectrogram characteristics.It is highly consistent with the distribution of spectral features.Thus,training parameters can be assigned to more reasonable positions,and more compact parameters can be trained with more effective features,which significantly reduces the risk of model over-fitting.Considering the fact that generally the existing efforts lack of versatility,i.e.,the robustness capability to different scenarios,we systematically investigate and test various key factors including speakers ' identities,contexts,recording devices and recording distances.The experimental results show that the algorithm can achieve accuracy rates higher than 99.8% in different recording scenes indicating the high degree of versatility of our proposed algorithm.In addition,the speech segments used in the experiments are as short as 0.2 second indicating that the proposed algorithm have wide applicability in practical applications.2.To propose an end-to-end replay attack detection algorithm based on Recurrent Neural Network(RNN),which directly models speech waveform data in time domain.CNN model can only detect fixed-length speech segments,while an RNN model is capable to detect segments of different lengths in different scenarios.The proposed model employs multiple sets of one-dimensional convolution kernels of different lengths,as well as large convolutional step sizes to extract features along time axis,to accumulate historical information through RNN,and to realize the end-to-end detection for various lengths.Theexperimental results show that the algorithm has a detection rate of over 99.3% for 0.5 second speech segment,and the accuracy increases as the length of the speech segment increases.However,due to the sparsity of the time domain data,this model fails to achieve good performance as the detection rate is only 95.9% for segments of 0.2s.To we continue to propose an RNN model based on spectrogram,which is trained by means of transfer learning.Features are more concentrated in spectrogram,and it acts as the input in this model.Also,as the spectrogram-based CNN model in contribution 1 achieves a very high detection rate in short segments,indicating that the extracted features are effective,here in this RNN model we uses some of the parameters in the previous CNN model for initialization(transfer learning).The experimental results show the detection rate of 0.2s short segments reaches 99.3%indicating that the proposed method significantly improves the robustness for short segments.And the accuracy increases as the length of the speech segment increases.In this thesis,the detection algorithms of voice replay attack can achieve excellent performance,which can be used as a detection module for ASR system to be robust to this kind of attack.The research in thesis have significant importance to the information security.
Keywords/Search Tags:Voice replay attack, Convolutional neural network, recurrent neural network, Transfer learning
PDF Full Text Request
Related items