A Study On The Countmeasures Of The Automatic Speaker Verification System Against Synthetic Speech

Posted on:2021-06-13

Degree:Master

Type:Thesis

Country:China

Candidate:C Liu

Full Text:PDF

GTID:2518306548481824

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

As a commonly used verification system,the automatic speaker verification(ASV)system is currently widely used in scenarios such as bank identity verification and mobile phone unlock login.Recently,some criminals begin to use the speech synthesis and voice conversion technologies to attack the ASV system in order to steal information and money.This undoubtedly brings serious security risks to the ASV system.At present,most of the speech synthesis and voice conversion technologies generate speech through phoneme-based stitching and adjustment methods.The speeches generated by such methods will have many obvious differences in different phonemes compared to natural speeches.If these difference information can be effectively used,the performance of the synthesized speech detection task can be well improved.In addition,speech synthesis technology is usually synthesized based on text,which does not consider the emotion in speech.Therefore,by analyzing the emotion in speech,it is also an effective method for distinguishing synthesized speech.In view of the above problems,this paper proposes two algorithms for the detection of synthesized speech.First,the phoneme-level F-Ratio analysis method is used to find the frequency band where the difference information distribution between the synthesized speech and the natural speech in the frequency domain is concentrated,and then the feature’s filter is modified according to the analysis result.Secondly,for the lack of emotion in synthesized speech,a method for extracting emotion features of synthesized speech using a pre-trained emotion recognition network is proposed.The experimental results show that both methods have certain synthetic speech detection capabilities.Among them,based on the phoneme analysis method,the EER and t-DCF in the ASVspoof 2019 LA dataset are better than the best single system results.The method based on emotional features has shown generalization ability.

Keywords/Search Tags:

Synthetic Speech Detection, Automatic Speaker Verification System, Antispoofing, Phoneme analysis, Emotion Feature

PDF Full Text Request

Related items

1	Research On Synthetic Speech Detection And Application In Automatic Speaker Verification
2	Automatic speechreading for improved speech recognition and speaker verification
3	Study On The Deception Detection Method Identified By The Automatic Speaker Verification System
4	Research On Speech Emotion Recognition Methods
5	Deep Learning Based Speech Emotion Recognition Research
6	Application Of Playback Speech Detection Method Based On AdaBoost Algorithm In Automatic Speaker Verification
7	Research On Non-specific Speaker Speech Emotion Recognition Based On Deep Feature Extraction And Processing
8	Research On Speaker-independent Speech Emotion Recognition Based On Deep Learning
9	Research On Synthetic And Converted Speech Detection Based On Multi-branch Convolutional Neural Network
10	Analysis Of Speaker Roles For Multi-speaker Conversational Speech