Research And Implementation Of Voice Conversion System Based On Deep Learning

Posted on:2022-01-18

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhang

Full Text:PDF

GTID:2518306314451754

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the popularization of artificial intelligence products,voice technology has attracted more and more attention from technicians.The voice changing system is the application of deep learning voice technology.For the existing sound-changing systems,most of them do not consider the influence of noise on the sound-changing performance during use.However,noise is an influential factor that cannot be ignored in the use of sound changing systems.There are many methods for noise reduction,but the model size of neural networks can easily exceed hundreds of megabytes,which limits its applicability in sound-changing systems.This thesis will use a lightweight Fully Convolutional Neural Network(FCNN)to improve the Automatic Encoder Voice Conversion(AutoVC)with " bottleneck " and improve the performance of the voice change system for noisy speech.The speech conversion system of this thesis is mainly realized based on the knowledge of deep learning,which can realize the speech conversion of the speech with noise to the specific target.In the deep learning-based voice conversion system proposed in this thesis,voice conversion includes five steps: user registration or login,direct recording or selection of the voice to be converted,preprocessing of the voice noise reduction,realization of noise reduction voice conversion processing and show the results after conversion.In order to improve the robustness of the voice changing system to noise,this thesis uses the automatic voice conversion system DNAutoVC with bottleneck that adds a preprocessing module for voice noise reduction in the voice conversion system.The input speech is passed through a fully convolutional neural network,which is composed of one-dimensional convolution and frequency-expanded two-dimensional convolution,and combines residual learning and skip connection structure.The noise reduction preprocessing of noisy speech is realized,and the preprocessed speech spectrogram is input to the content encoder to obtain the content information in the source speech.At the same time,the specific target speech frequency spectrum is input to the speaker encoder to obtain the speaker information of the target speech.Then the source voice content information and the target voice speaker information are cascaded and input into the decoder to finally realize the voice conversion.The system uses B/S structure,users do not need to download a special client and can easily realize the voice change through the browser.After theoretical analysis and test experiments,the voice change system based on deep learning proposed in this thesis has improved the performance of noisy speech voice conversion compared with the unimproved automatic voice conversion system with bottleneck.

Keywords/Search Tags:

voice conversion, noise reduction, deep learning, spectrogram, full convolutional neural network

PDF Full Text Request

Related items

1	Research On Multimodal Voice Conversion Under Adverse Environment Using Deep Convolutional Neural Network
2	Investigation On Deep Learning Based Voice Conversion
3	Study On The Neural Network Modelling Method For Voice Conversion
4	Research On Deep Learning Image Noise Reduction Based On Multi-Stage Supervision
5	A Study On Deep Learning-Based Voice Conversion For Identity Disguise In Voice Communication
6	Mongolian Voice Conversion System Based On Deep Learning
7	Monaural Singing Voice Separation Using Deep Learning
8	Multiple-phase Shift Keying Recognition Based On Convolutional Neural Network
9	Study On Voice Spoofing Detection Based On Deep Learning
10	Research Of Audio Alassification Algorithms Based On Convolutional Neural Network And Its Applications