| As a type of biometric technology,Automatic Speaker Verification has many applications,such as smart-home devices,bank ID identification,and identifying criminal suspects.Safety of ASV is an important research content.In recent years,speech synthesis has become more and more mature.This technology can be used to generate audio files which is difficult to distinguish it real or fake.It is a great threat to ASV.Someone can illegally obtain the user’s access rights with the help of speech synthesis,which poses great threat to voice-based access control.To ensure the security of ASV,it is necessary to develop Synthetic Speech Detection technology,which is the key to ensuring the security of ASV systems.The research content of this paper is mainly on Synthetic Speech Detection.The main work and innovations completed in this paper are as follows.1.Different acoustic features was used for Synthetic Speech Detection.And the models trained by different feature were fused.Firstly,LFCC features was chosen as input feature.A ResNet-based structure was designed for this task,and different loss functions were tried.Then,this paper analyzes the FBANK performance under different window lengths and window functions.A data augmentation scheme is designed to improve the detection performance as well.Finally,pre-trained model was adopted.Different ways of fine-tuning were tried for this task.The final fused model can get EER of 0.12%,min-t DCF of 0.0063.2.Synthetic Speech Detection needs to serve ASV systems.Telephone is a common scenario for ASV.The codecs transmission occurs when across the telephone network such as Vo IP and PSTN.So Synthetic speech Detection needs to ensure robustness under different codecs.In order to solve this problem,this paper proposes a codecs data augmentation way and MKTDNN structure for it.In every convolution block of MKTDNN,feature maps achieved by convolution computations with different kernel sizes can be fused efficiently by feature fusion module.This system achieves EER of 3.85% and min-t DCF of 0.2753 in ASVspoof2021,which is a relatively 17.9% improvement in EER and 4.5% in min-t DCF over the best baseline.3.This paper explores how to apply the Synthesized Speech Detection systems to the ASV systems.Respective method was designed for improving performance of ASV and Synthesized Speech Detection.PLDA was chosen for ASV backend.As mismatch between test data and training data,domain adaptation technology was taken for improving speaker verification performance.Synthesized Speech Detection keep going approach in previous research.A score-based fusion method is designed for fused two different system.The final SASV-EER is 1.04% and SV-EER is 1.46%in SASV challenge. |