Font Size: a A A

A Study On The Countmeasures Of The Automatic Speaker Verification System Against Synthetic Speech

Posted on:2021-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:2518306548481824Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As a commonly used verification system,the automatic speaker verification(ASV)system is currently widely used in scenarios such as bank identity verification and mobile phone unlock login.Recently,some criminals begin to use the speech synthesis and voice conversion technologies to attack the ASV system in order to steal information and money.This undoubtedly brings serious security risks to the ASV system.At present,most of the speech synthesis and voice conversion technologies generate speech through phoneme-based stitching and adjustment methods.The speeches generated by such methods will have many obvious differences in different phonemes compared to natural speeches.If these difference information can be effectively used,the performance of the synthesized speech detection task can be well improved.In addition,speech synthesis technology is usually synthesized based on text,which does not consider the emotion in speech.Therefore,by analyzing the emotion in speech,it is also an effective method for distinguishing synthesized speech.In view of the above problems,this paper proposes two algorithms for the detection of synthesized speech.First,the phoneme-level F-Ratio analysis method is used to find the frequency band where the difference information distribution between the synthesized speech and the natural speech in the frequency domain is concentrated,and then the feature’s filter is modified according to the analysis result.Secondly,for the lack of emotion in synthesized speech,a method for extracting emotion features of synthesized speech using a pre-trained emotion recognition network is proposed.The experimental results show that both methods have certain synthetic speech detection capabilities.Among them,based on the phoneme analysis method,the EER and t-DCF in the ASVspoof 2019 LA dataset are better than the best single system results.The method based on emotional features has shown generalization ability.
Keywords/Search Tags:Synthetic Speech Detection, Automatic Speaker Verification System, Antispoofing, Phoneme analysis, Emotion Feature
PDF Full Text Request
Related items