Investigation On Deep Learning Based Voice Conversion

Posted on:2019-09-19

Degree:Master

Type:Thesis

Country:China

Candidate:J H Lai

Full Text:PDF

GTID:2428330590992284

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Voice Conversion is a speech processing technique to convert an original speech to the speech with another style.Voice Conversion has many applications.The most obvious use of voice conversion is to generate speech database with limited data for TTS.Meanwhile,voice conversion plays an important role in speech restoration,speech translation and some security related applications.Speaker conversion is the most important task in voice conversion and also is the main research topic of this paper.Voice conversion has two categories depending on the content of the database,one is with parallel data and the other with non-parallel data.Voice conversion with parallel data means the database includes speech from the source and target speakers with the same content,while voice conversion with nonparallel data means the database only includes speech with different content or small portion of the same content.The paper proposes a phone-aware voice conversion framework based on neural networks.ASR is used to extract phoneme alignment of the speech.Voice Activity Detection is used to get more precise speech boundary for the extracted phoneme.With the help of phoneme information,an improved DTW algorithm is used to get more accurate frame-level alignment.Finally,LSTM-RNNs is used as conversion model to convert spectral features with the help of phoneme features.The evaluation experiment shows the proposed phone-aware LSTM-RNNs system has significantly better performance than baseline LSTM-RNNs in both objective and subjective evaluations.The paper also proposes a dual learning based voice conversion framework without large amount of parallel training data.Small amount of parallel data is used to train an initialized voice conversion model.The dual learning mechanism is used to simultaneously train the spectral conversion model from speaker A to speaker B and speaker B to speaker A.A spoofing detection model is used as a supervision to keep intermediate spectral features to from distortion.The experiment shows the dual learning framework can improve the initialized conversion models in subjective evaluation while the objective evaluation of the model does not deviate from the normal value,which proves the dual learning can effectively use unparalleled data to improve the conversion model with supervision of spoofing detection.

Keywords/Search Tags:

Voice Conversion, Neural Network, Dynamic Time Warping, Dual Learning, Spoofing Detection

PDF Full Text Request

Related items

1	Study On Voice Spoofing Detection Based On Deep Learning
2	The Research And Implementation Of Voice Conversion Technology
3	Speech Spoofing Detection Based On Dense Neural Network
4	Research Of Voice Conversion Based On Frequency Warping Method
5	Research On Detection Algorithm Of Speech Spoofing And Its System Implementation
6	Age-Voice Conversion System Driven By Multi-Parameter
7	Research On High Quality Voice Conversion Algorithm Based On Improved GMM And Frequency Warping
8	Voice Application System Technology Research Based On Voice Sample Matching
9	Study On The Neural Network Modelling Method For Voice Conversion
10	Research And Implementation Of Voice Conversion System Based On Deep Learning