Font Size: a A A

Research On Key Technologies Of Replay Speech Detection In Multiple Scenarios

Posted on:2020-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:L LinFull Text:PDF
GTID:2428330626951319Subject:Engineering
Abstract/Summary:PDF Full Text Request
While the voiceprint recognition technology is constantly evolving,various spoofing attacks on voiceprint recognition systems are increasingly serious.Replay attacks are considered to be the easiest and most effective method of these spoofing attacks.The main reason is that the replay speech is directly derived from the genuine speech of the target speaker,and it does not require the attacker to master any special signal processing knowledge.Moreover,with the development of audio encoding and decoding technology,high-fidelity recording equipment and playback equipment make replay speech attack voiceprint authentication system easier to implement.Therefore,how to effectively detect replay speech has become an urgent problem to be solved.This paper mainly focuses on the key technologies of replay speech detection in multiple scenarios.We deeply analyze the principles and models of the existing research methods,and find that the existing algorithms have the disadvantages of low robustness and high algorithm complexity.According to the shortcomings,three parts of the research work has been conducted as follow.Firstly,the difference between the replay speech and genuine speech is analyzed in detail from the perspective of time domain and frequency domain.Research shows that the replay speech obtained by different replay configurations has different degrees of distortion which is mainly distributed in the high-frequency region and the low-frequency region.To explore the impact of recording equipment,playback equipment,and acoustic environment on replay speech quality,we proposed a relative entropy feature.The experimental results show that the playback device is the main factor affecting replay speech quality.When the quality of the playback device changes from low quality to high quality,the relative entropy has the largest fluctuation range which decreases from 0.83 to 0.12.A secondary factor is the recording device,whose relative entropy value changes from 0.25 to 0.05 as the quality of the device changes.The least influential factor is the acoustic environment,which reduces the entropy from 0.27 to 0.11 when the acoustic environment changes.Secondly,we propose two high robustness replay speech detection algorithm.We first perform a detailed analysis of the sub-band frequencies of the replay speech from the feature level and the classifier level.Then the number and type of filters of each sub-band frequency are modified according to the equal error rate ratio.Therefore,a replay speech detection algorithm based on the modified cepstral feature is proposed.The experimental results show that the proposed method has a significant improvement compared with the existing algorithm,and its EER(Equal Error Rate)is 9.77%,which is 59% higher than the baseline of the database.The other scheme is to perform band-stop filtering in the time domain of the speech signal and only retain the low-frequency signal and high-frequency signal with the discriminative information.The filtered signal is then extracted for its cepstral coefficients.The experimental results show that this method can effectively detect the replay speech,which has a 57.9% improvement compared with the baseline system,and the EER is 10.34%.Finally,we recommend using normalization methods for channel compensation to improve the performance of the detection algorithm.By establishing a mathematical model of the replay speech,the distribution of channel information in the replay speech is studied.To verify the validity of our proposed methods,we applied four different normalization methods to the six cepstral features mentioned in this paper.The experimental results show that the four normalization methods can improve the detection performance of existing algorithms to different degrees.Among them,CMVN(Cepstral Mean and Variance Normalization)and QCN(Quantile-based Cepstral Dynamics Normalization)achieve the best performance.In the baseline system,two normalization methods improved the performance of the algorithm by 43.30% and 36.95%,respectively.In our method,the performance improvement of both normalization methods exceeds 65.00%.
Keywords/Search Tags:Voiceprint recognition, replay speech, cepstral, band-stop filters, channel compensation
PDF Full Text Request
Related items