Research On Speech Spoofing Detection Based On Feature Fusion And Residual Attention

Posted on:2024-02-09

Degree:Master

Type:Thesis

Country:China

Candidate:J M Kang

Full Text:PDF

GTID:2568306941994719

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

The emergence of various deception attacks pose a serious threat to the automatic speaker verification(ASV)system,and even brings great harm to social security.More and more experts and scholars pay attention to the research of speech spoofing detection.At present,the research focuses on manual feature extraction and traditional convolutional neural network methods.Manual feature methods require preprocessing operations on speech first,which can lead to the loss of some deep-level features,while traditional convolutional neural networks have poor generalization performance and cannot guarantee generalization against unknown types of spoofing attacks.Therefore,based on the above problems,the study has improved the feature extraction and recognition models respectively and put forward the corresponding spoofing detection methods.The main work of this paper is as follows:Firstly,aiming at the problem of poor discrimination of existing acoustic features in the filed,a speech spoofing detection method based on spectrum-time feature fusion is proposed.In view of the fact that the artifacts that distinguish real speech from false speech exist in different frequency bands or subbands,the frequency domain information and time domain information of speech are extracted respectively,and the acoustic features captured by different channels are complemented by feature fusion to obtain more global speaker information.Experimental results show that the detection algorithm achieves an Equal Error Rate(EER)of 3.78% and a tandem detection cost function(t-DCF)of 0.1035 on the LA logical access corpus of ASVspoof2019.Compared with the best algorithm of the baseline system,the performance is improved by 53.3% and 51.1% respectively.Secondly,aiming at the problem of poor generalization performance of existing models for unknown detection attacks,a speech spoofing detection method based on multi-channel residual attention network is proposed.Considering that a single feature is not enough to grasp the global deception factor,the multi-channel fused spectrogram features are adopted,and residual attention learning mechanism is used to pay different attention to spectrogram features of different channels,and improved Res Net network is used to learn more refined acoustic information for classification.The reperiment results show that the detection algorithm achieves 3.60 EER and 0.0931 t-DCF on the LA logical access cropus of ASVspoof2019.Compared with the best algorithm of the baseline systems,the performance is improved by 55.5% and 56.0% respectively.The comparison experiments show that the proposed method in this paper has significantly improved the prediction accuracy compared with previous methods,and the proposed spectral-temporal feature fusion methods and residual attention network are both generalized for speech-related tasks compared with traditional speech spoofing detection methods.

Keywords/Search Tags:

Speech spoofing detection, Automatic speaker verification, Spectral-temporal feature fusion, Multi-channel residual attention

PDF Full Text Request

Related items

1	Research On Synthetic Speech Detection And Application In Automatic Speaker Verification
2	Research On Speech Spoofing Detection Based On Attention Mechanism And End-to-End Model
3	Study On The Deception Detection Method Identified By The Automatic Speaker Verification System
4	A Study On The Countmeasures Of The Automatic Speaker Verification System Against Synthetic Speech
5	Research On Speech Spoofing Detection Based On Feature Pyramid Residual Network
6	Research On Synthetic And Converted Speech Detection Based On Multi-branch Convolutional Neural Network
7	Spoofing Speech Detection Research
8	Research On Feature Of Speaker Vevification And Playback Attacks Detection
9	Analysis Of Speaker Roles For Multi-speaker Conversational Speech
10	Automatic speechreading for improved speech recognition and speaker verification