Font Size: a A A

Research On Recording Playback Detection Based On Constant Q Transform And Graph Fourier Transform

Posted on:2022-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:M X TianFull Text:PDF
GTID:2518306779468844Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology,speaker recognition system is more and more widely used in daily life,but with it comes the attack of various disguised speech on speaker recognition system.Among many camouflaged speech,recording and playback attacks pose a great threat to the speaker recognition system with the improvement of the quality of its recording equipment and the attacker does not need professional knowledge.In order to ensure the application security of speaker recognition system,resisting the attack of recording and playback has become an urgent problem to be solved.In order to effectively resist the attack of recording playback,this paper proposes two feature extraction methods suitable for recording playback detection.They are cepstrum coefficients based on constant-Q transformation(CQT)and graph Fourier transformation(GFT).The frequency range of human vocal organs is mostly concentrated in low frequency.The commonly used time-frequency conversion method for signals is short-time Fourier transform(STFT).STFT will have problems such as periodic truncation at low frequency,which will lead to low frequency resolution of speech.Constant Q transform can solve this problem well,provide higher resolution for low frequency,and more completely reflect the characteristics of the original sound.In this paper,constant-Q variance-based cepstrum coefficients(CVCC)and constant-Q mean-based cepstrum coefficients(CMCC)are proposed.These two features further strengthen the high-frequency nonlinear distortion of playback recording and improve the distinction between playback speech and real speech by adding the mean or variance of samples to the amplitude spectrum after CQT transformation,it is helpful for the system to better distinguish the playback voice.Compared with traditional digital signal processing,graphic signal processing can more accurately express the correlation between voice sampling points and mine more hidden information between voice sampling points.The speech graph signal is constructed by using the combined shift operator.On this basis,the graphics Fourier analysis of the speech in the graph domain is carried out to extract the GFCC feature.Compared with fast Fourier transform,graph Fourier transform can more accurately represent the structural relationship of speech sampling points,which makes the real speech and playback speech highly distinguishable in frequency domain.In the research,the recording and playback detection system composed of GFCC feature and light convolution neural network has a considerable improvement in performance compared with asvspoof2017v2 baseline system CQCC-GMM under the evaluation index of equal error rate(EER).Under the evaluation indexes of equal error rate and tandem Detection Cost Function(t-DCT),the performance of the system based on GFCC is much better than that of the LFCC-GMM baseline system of asvspoof2019.The detection results of recording playback attack based on asvspoof2017v2 evaluation set show that the equal error rates of the two recording playback detection systems based on constant Q transform and graph Fourier transform proposed in this paper are 14.05% and 10.96%respectively.Compared with the baseline system CQCC-SDA-GMM,the performance is improved by 16.29% and 28% respectively.The detection results of recording playback attack based on asvspoof2019 physical evaluation set show that the equal error rates of the two recording playback detection systems based on constant Q transform and graph Fourier transform proposed in this paper are 3.1664% and 1.51% respectively.Compared with the baseline system LFCC-GMM,the performance is improved by 76% and 89% respectively.The experimental results fully show that the two features can effectively resist the recording and playback attack,and improve the security of the speaker recognition system in practical application to a certain extent.
Keywords/Search Tags:Playback Detection, Graph Fourier Transformation, Graph Frequency Cepstral Coefficients, Constant Q transform, Constant Q Variance-based Cepstrum Coefficients, Constant Q Mean-based Cepstrum Coefficients
PDF Full Text Request
Related items