Font Size: a A A

Study On Synthetic Speech Detection Algorithm Based On Deep Learning

Posted on:2021-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:T HuangFull Text:PDF
GTID:2518306473974489Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of digital multimedia technology,audio signals have become an important way for people to communicate in daily life due to their small physical storage,easy editing and transmission.Because the signals are easy to edit,it can easily be tampered and abused for criminal activities.Therefore,it is necessary to study the speech detection technology.In recent years,synthetic speech technology has made new breakthroughs and developments in the background of deep learning.Attacks in the form of speech synthesis,speech conversion,and recording playback have made the Automatic Speaker Verification(ASV)system relatively weak.The existing detection technology is difficult to effectively distinguish high-quality synthetic speech and natural speech.Therefore,the detection technology of synthetic speech has gradually received the widespread attention by relevant researchers.This paper studies the synthetic speech detection algorithms based artificial intelligence.The main work is summarized as follows:(1)Starting from the design perspective of the classifier for synthetic speech detection is not mature enough,this chapter proposes a synthetic speech detection algorithm based on Gate Recurrent Unit(GRU)and Support Vector Machine(SVM).The new proposed GRU-SVM network model is a specialized speech synthesis detected classifier.GRU has better processing ability for time series data such as speech.And SVM makes the detection effect better by regressing the output of hidden layers of GRU.The ASVspoof2019 database and the database synthesized by the Waveglow algorithm are used for our experiments.Meanwhile,during the detection process,the characteristic parameters of Mel Frequency Cepstrum Coefficient(MFCC)are used as features.The experiment explores the influence of different MFCC features dimensions on the detection results,and then selects the appropriate feature dimension for subsequent experiments.Compared with other similar classifiers,the detection rate of GRU-SVM can reach 99.99% and 99.34% in the two databases,respectively.In addition,the Equal Error Rate(EER)can also reach 0.03% and 0.15%.The experimental results show that the proposed model can effectively distinguish natural speech and synthetic speech.(2)In view of the traditional MFCC features,there are problems such as insufficient utilization of high-frequency components of speech and incomplete representation of speech information in synthetic speech detection.In this chapter,a special feature for synthetic speech detection is designed by studying the process of traditional cepstrum coefficient feature extraction.The algorithm uses another variant of the recurrent neural network structure,Long Short Time Memory(LSTM),as a tool to strengthen the feature,but the LSTM network structure is not directly involved in the training of MFCC features,but obtains a set of weights by learning the energy spectrum features.The Mel filter is changed and strengthened by using this set of weights and a new feature LSTM-MFCC is formed.This experiment using Gaussian Mixture Model(GMM)to train and evaluate the features,and uses its score system to verify on two different database.Compared with other features,the detection rate of LSTM-MFCC can reach 99.84% and 99.19%,and the EER is 0.18%and 0.89%.Compared with other performance,the proposed LSTM-MFCC feature detection algorithm is significantly better than other synthetic speech detection algorithm.
Keywords/Search Tags:synthetic speech detection, recurrent neural network, support vector machine, Mel frequency cepstral coefficients, feature extraction
PDF Full Text Request
Related items