Font Size: a A A

Research On Synthetic And Converted Speech Detection Based On Multi-branch Convolutional Neural Network

Posted on:2024-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y WenFull Text:PDF
GTID:2568307112976399Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of deep learning theory and technology,speaker recognition technology based on deep learning gradually moves towards practical applications.However,the fake speeches generated by the speech synthesis technologies and voice conversion technologies based on deep learning,has caused significant risks to the speaker verification system.As an important auxiliary task of speaker verification system,the countermeasure of speech spoofing has become one of the hot spots in the field of speaker recognition.Malicious voice attacks pose a serious threat to social security,so researches of speech anti-spoofing are in urgent need and have great significance.Countermeasures of speech spoofing are becoming more and more mature,but in the face of unknown attacks in practical scenarios,the fake audio detection system shows a lack of generalization capabilities.Therefore,it is very important to develop generalised countermeasures against unknown voice attacks.At the same time,in order to reduce the impact of complex and diverse codecs and transmission channels,the training data should be appropriately enhanced to improve the robustness of the system to actual fake speech.Based on Gaussian mixture model and convolution neural network,the following work has been done in the research field of speech spoofing detection system:(1)This paper proposes a speech spoofing detection system based on a MobileNet variant,in which Gaussian Mixture Model acts as a feature extractor.On this basis,this system is further optimized.The optimization scheme is the m GMM-MobileNet(multi-path Gaussian Mixture Model-MobileNet)model based on data enhancement and multi-branch network structure.The model uses different GMMs to fit the feature distribution of the training data from different augmentation methods,and the log-gaussian probability feature are inputed into different branch networks.The equal error rates of m GMM-MobileNet on ASVspool 2021 LA and DF are 4.10% and 15.85%,respectively.The experimental results show that the multi-branch network structure based on the data augmentation can effectively improve the discrimination and generalization capabilities of the model in speech spoofing detection.(2)In view of the problem that m GMM-MobileNet has insufficient discrimination capability in evaluation dataset,the residual learning block is used to build the network model.In order to reduce the computational cost,a Group GMM-ResNet model based on feature grouping and multi-branch network structure are proposed for speech spoofing detection.Based on the training method of Gaussian mixture model,the grouping methods of log-gaussian probability features are proposed,and the divided log-gaussian probability features are input into different branch networks to extract the deep embedding.The equal error rates of Group GMM-ResNet on ASVspool 2021 LA and DF are 2.56% and 16.76%,respectively.The experimental results show that the accuracy of the model in speech spoofing detection can be improved by LGP feature grouping and multi-branch network structure.
Keywords/Search Tags:speech spoofing detection, multi-branch convolution neural network, log-gaussian probability feature, speaker verification
PDF Full Text Request
Related items