Research On Speech Synthesis And Spoof Detection Based On Deep Learning

Posted on:2023-12-14

Degree:Master

Type:Thesis

Country:China

Candidate:H D Xiao

Full Text:PDF

GTID:2568307103492774

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Intelligent voice has become an important way of human-computer interaction.Speech synthesis technology is an important part of intelligent speech,realizing the conversion of text into a speech signal that reads this text,giving the machine the ability to speak like a human.With the development and popularization of speech synthesis technology,people’s demand for personalized multi-speaker speech synthesis is getting stronger and stronger.On the other hand,criminals use speech synthesis and other technologies to forge other people’s voices,attack voice identity authentication systems and carry out telecommunications fraud,bringing security risks to society.It is of great social significance to study the speech forgery method for accurately identifying forged speech to protect the safety of citizens’ property and privacy.The existing speech synthesis and speech forgery methods based on deep learning have the following problems:(1)The existing multi-person speech synthesis methods do not have sufficient feature fusion of the reference speech,and the timbre consistency constraints of the synthesized speech are insufficient;(2)The generalization of speech pseudo-discrimination models is insufficient,and it is difficult to cope with the increasingly mature speech synthesis and speech conversion methods.In order to solve the above problems,this article proposes corresponding solutions.Aiming at the problem(1),this paper proposes a multi-person speech synthesis method based on adversarial learning.A text-speech feature fusion method based on affine transformation is designed,and the affine transformation parameters are predicted by using speech features,and a timbre perception discriminator is designed by introducing an adversarial mechanism,and the acoustic model is trained to improve the similarity between the synthesized speech and the target speaker’s speech..Aiming at the problem(2),this paper describes a speech forgery method based on data enhancement.By introducing a data enhancement method based on frequency domain exchange,the generalization of the speech detection model is improved,and a time-frequency attention mechanism is designed to calibrate the speech features,so that the model pays more attention to the moment and frequency band with significantly fake information,and the equal error rate on the ASVSpoof LA dataset reaches 1.88%.In addition,this paper uses the proposed speech synthesis method to design and implement a Slide2 Video system that automatically generates speech videos based on speech slides and speeches,which is convenient for researchers participating in international academic online conferences.

Keywords/Search Tags:

Deep learning, Speech Synthesis, Speech Spoofing Detection, Attention Mechanism

PDF Full Text Request

Related items

1	Research On Detection Algorithm Of Speech Spoofing And Its System Implementation
2	Speech Anti-spoofing Based On Deep Learning
3	Research And Implementation Of Speech Synthesis Based On Fastpeech
4	Research On Speech Spoofing Detection Based On Attention Mechanism And End-to-End Model
5	Research And Application Of Speech Synthesis Technology Based On Deep Learning
6	Research On Speech Synthesis Based On Deep Learning
7	Research On Algorithms Of Speech Synthesis Based On Deep Neural Network
8	Research On Deep Learning Based End-to-End Chinese Speech Synthesis
9	Research On Personalized Speech Synthesis Based On Deep Speech Representations
10	Improved Tacotron2 Speech Synthesis Method Based On Forced Monotonic Attention Mechanism