Font Size: a A A

Research And Implementation Of Voice Conversion System Based On Deep Learning

Posted on:2022-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2518306314451754Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the popularization of artificial intelligence products,voice technology has attracted more and more attention from technicians.The voice changing system is the application of deep learning voice technology.For the existing sound-changing systems,most of them do not consider the influence of noise on the sound-changing performance during use.However,noise is an influential factor that cannot be ignored in the use of sound changing systems.There are many methods for noise reduction,but the model size of neural networks can easily exceed hundreds of megabytes,which limits its applicability in sound-changing systems.This thesis will use a lightweight Fully Convolutional Neural Network(FCNN)to improve the Automatic Encoder Voice Conversion(AutoVC)with " bottleneck " and improve the performance of the voice change system for noisy speech.The speech conversion system of this thesis is mainly realized based on the knowledge of deep learning,which can realize the speech conversion of the speech with noise to the specific target.In the deep learning-based voice conversion system proposed in this thesis,voice conversion includes five steps: user registration or login,direct recording or selection of the voice to be converted,preprocessing of the voice noise reduction,realization of noise reduction voice conversion processing and show the results after conversion.In order to improve the robustness of the voice changing system to noise,this thesis uses the automatic voice conversion system DNAutoVC with bottleneck that adds a preprocessing module for voice noise reduction in the voice conversion system.The input speech is passed through a fully convolutional neural network,which is composed of one-dimensional convolution and frequency-expanded two-dimensional convolution,and combines residual learning and skip connection structure.The noise reduction preprocessing of noisy speech is realized,and the preprocessed speech spectrogram is input to the content encoder to obtain the content information in the source speech.At the same time,the specific target speech frequency spectrum is input to the speaker encoder to obtain the speaker information of the target speech.Then the source voice content information and the target voice speaker information are cascaded and input into the decoder to finally realize the voice conversion.The system uses B/S structure,users do not need to download a special client and can easily realize the voice change through the browser.After theoretical analysis and test experiments,the voice change system based on deep learning proposed in this thesis has improved the performance of noisy speech voice conversion compared with the unimproved automatic voice conversion system with bottleneck.
Keywords/Search Tags:voice conversion, noise reduction, deep learning, spectrogram, full convolutional neural network
PDF Full Text Request
Related items