Font Size: a A A

Research On Speech Spoofing Detection Based On Attention Mechanism And End-to-End Model

Posted on:2022-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:L C HuangFull Text:PDF
GTID:2518306572991469Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of deep learning,speech generation technologies such as speech synthesis and speech conversion have become more mature.They have been able to generate very natural,smooth,and realistic speech.Although speech depth generation technology brings diversified entertainment methods to people's lives,it also brings great security threats to the automatic speaker recognition system.Therefore,academia began to design special ASV spoofing detection systems.However,the current neural network adopted by the recent speech spoofing detection algorithms are all classic structures optimized for images,and the effect is not ideal.To effectively detect the artifact in the speech signal,this paper makes a series of optimization and improvement on the network structure according to the characteristics of the spoofing speech signal.The main research contents are as follows:1.Given the poor performance of traditional convolutional networks that are not suitable for ASV spoofing detection tasks,this paper optimizes and improves the network structure from three aspects: frequency,channel and time domain.Traditional convolutional networks cannot capture the inter-harmonic correlation of fake speech frequencies;as the number of layers increases,the number of channels becomes too large,and there is a certain amount of information redundancy;the final global average pooling it easy to lose helpful information.In response to the above problems,this paper proposes a frequency attention module,a channel attention module,and a time self-attention layer to optimize the extracted acoustic features to obtain a more discriminative acoustic feature.2.Given the information loss in the traditional artificially designed acoustic feature extraction process,and a single feature cannot detect multiple forgery attack algorithms simultaneously,this paper has tried to solve the spoofing detection problem in an end--to--end manner.By analyzing the calculation process of the Fourier transform in speech signal processing,a method of time-frequency conversion using time-domain convolution is proposed,and the feasibility of this method is verified through experiments;For issues such as the time-domain convolution parameters are too many and inability to learn effective filters,the SINC function in SincNet is introduced to construct a bandpass filter convolution with only two learnable parameters,which reduces the parameters and improves performance;In addition,inspired by RawNet2,this paper proposed to replace the 2d convolutional network residual block with the 1d time-domain convolution residual block and use the recurrent neural network GRU to model the frame-level features.3.Through many comparative experiments on the ASVspoof2019 LA data set,the effectiveness and feasibility of the method proposed in this paper are shown.The EER of the optimal algorithm based on the attention model in this paper is 1.87%,which surpasses all known single-system models.
Keywords/Search Tags:Deep learning, Automatic speaker recognition, ASV spoofing detection, Attention mechanism, End-to-end model
PDF Full Text Request
Related items