Synthesis And Replay Spoofed Speech Detection Research

Posted on:2024-04-24

Degree:Master

Type:Thesis

Country:China

Candidate:C L Hu

Full Text:PDF

GTID:2568307076997869

Subject:Robotics Engineering (Professional Degree)

Abstract/Summary:

PDF Full Text Request

Automatic Speaker Verification(ASV)systems authenticate a speaker’s identity by analyzing their voice.Deep learning has made these systems popular,offering high accuracy and user-friendliness.However,ASV systems are vulnerable to fraudulent attacks.Attackers can exploit techniques like replay,speech synthesis,and voice conversion to generate false speech and undermine ASV systems.Consequently,it is crucial to investigate effective deception speech detection methods to mitigate the threat posed by ASV attacks.Based on datasets from ASVspoof 2019 and ASVspoof 2021 challenges,this paper conducts research in multiple aspects,including front-end feature extraction,back-end classifier selection,loss function design,and model fusion.The research achieves the following advancements:(1)A synthetic speech deception detection method is proposed,based on online hard example mining.This method effectively addresses the problem of imbalanced distribution between simple and difficult samples in the training set by selecting hard samples with high training loss values for online feedback training.Experimental results show that the introduction of the Online Hard Example Mining(OHEM)algorithm leads to relative decreases of 42%,28%,25%,and 22% in equal error rates(EER)for four different deep neural network models,namely Res Net18,Res Net50,SE-Res2 Net,and Raw-Res2 Net,respectively.(2)A new network architecture called Raw-Res2 Net is proposed.Compared to the Raw Net2 model,t this model utilizes Res2 Net blocks instead of residual blocks and employs a squeeze-and-excitation mechanism for feature map scaling.Res2 Net enhances the representation of multi-scale features and expands the perceptual field of each layer.Squeezeand-excitation blocks recalibrate channel direction feature responses by explicitly modeling channel interdependencies.Experimental results demonstrate that,with the introduction of the OHEM algorithm,the proposed new model reduced the EER relative to the Raw Net2 model by35%.Compared to the two baseline systems in the ASVspoof 2019 competition,the EER relative decreased by 63% and 68%,respectively.(3)A speech detection method for replay attacks based on a dual-input hierarchical fusion network is proposed.This method uses the original signal and the time-reversed signal as the model’s two inputs and introduces a hierarchical fusion module to effectively fuse the output results of the different residual blocks of the upper and lower layers.On the ASVspoof 2021 PA test set,this method demonstrated high performance,with an EER of 24.46% and a min tDCF value of 0.6708.Compared to the four baseline systems in the ASVspoof 2021 competition,the min t-DCF value decreased relative by 28.9%,31.0%,32.6%,and 32.9%,respectively.

Keywords/Search Tags:

ASVspoof, speech spoofing detection, online hard example mining, hierarchical fusion network

PDF Full Text Request

Related items

1	Research On Detection Algorithm Of Speech Spoofing And Its System Implementation
2	Research On Speech Spoofing Detection Based On Feature Fusion And Residual Attention
3	A Method For Detection Of Spoofing Speech Based On CLBP And HOG
4	Small Object Detection Algorithm Based On Deep Convolution Neural Network
5	Research On Salient Object Detection Based On Hard Sample Mining
6	Speech Spoofing Detection Based On Residual Network
7	Research On Two-path BiLSTM And DCNN Model Based On Gaussian Probability Features For Speech Spoofing Detection
8	Detection Of Speech Spoofing Based On Residual Network
9	A Noise-robust Algorithm For Spoofing Speech Detection Based On AMBP And RAMBP
10	Research On Speech Spoofing Detection Based On Feature Pyramid Residual Network