Sing Voice Conversion Based On CBAM And Dynamic Channel Fusion

Posted on:2024-08-21

Degree:Master

Type:Thesis

Country:China

Candidate:S H Gao

Full Text:PDF

GTID:2568307136491794

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

Voice conversion(VC)is an intelligent voice technology that aims to achieve speaker identity conversion while keeping the content information of source voice unchanged.As an important branch in the field of voice conversion,sing voice conversion has many important applications in multimedia entertainment and voice interaction systems.With the development of artificial intelligence and neural network technology,sing voice conversion technology is also progressing rapidly,and various classical conversion models have achieved good conversion performance.In practical applications,a mature sing voice conversion technique not only needs to be able to perform the identity conversion between different sing voices well,but also achieve good conversion performance in the open set case given only the normal speech of the target.On the other hand,the operational efficiency of the model directly affects the storage and computational resources required in practical applications.Therefore,this paper discusses and investigates two aspects of improving the conversion performance of sing voice and the operation efficiency of model,and proposes a series of improvement methods.Firstly,in order to effectively realize sing voice conversion and broaden its application,this paper proposes the Style GAN sing voice conversion model,which extracts the identity information of the target singer through the style encoder and achieves a good sing voice conversion performance.Further,this paper introduces CBAM attention mechanism to improve the generator of model and proposes C-Style GAN sing voice conversion model,which improves the generation and expression ability of the model without increasing the depth and width of the network,enhances extraction of the details of sing voice spectrum,and effectively improves the quality of the converted sing voice.Subjective and objective experimental results show that compared with the Star GAN model,C-Style GAN model proposed in this paper,improves the average MOS by 36.18%,improves the ABX by 16.55%,and reduces the MCD of reconstructed sing voice by 13.60%,effectively improving the conversion quality of sing voice.At the same time,the model can complete the conversion with only the normal speech of target in the open set case,which can release the dependence on target sing voice and broaden its application range.Secondly,in order to improve the model efficiency,this paper introduces dynamic channel fusion to improve the dynamic convolution in the generator,and further proposes the DC-Style GAN sing voice conversion model.Rethinking the dynamic convolution from the perspective of matrix decomposition,the dynamic channel fusion achieves a significant dimensionality reduction in the potential space and alleviates the difficulty of joint optimization of dynamic attention and static convolution kernels,thus improving the operation efficiency.Subjective and objective experimental results show that compared with C-Style GAN model,the model has 66.87% fewer parameters and 34.09% faster training speed,while the average MOS value and average ABX value are basically unchanged.It is proved that the optimization scheme can substantially reduce the number of parameters of model,accelerate the training speed of model,and effectively improve model operation efficiency,thus making the model more lightweight,while ensuring that the conversion performance is basically unaffected.In summary,the DC-Style GAN sing voice conversion model proposed in this paper has good conversion effect and can complete the conversion with only a given target normal speech in the open set case.On the other hand,the model also has high operational efficiency and low training cost,which provides an important theoretical discussion and simulation study for sing voice conversion technology towards practical application.

Keywords/Search Tags:

voice conversion, sing voice conversion, CBAM attention mechanism, dynamic channel fusion, open set case, operation efficiency

PDF Full Text Request

Related items

1	The Research And Implementation Of Voice Conversion Technology
2	Mongolian Voice Conversion System Based On Deep Learning
3	Cross-lingual Voice Conversion Based On Mutual Information And SE Attention Mechanism
4	Research On Speech Conversion Algorithms Based On Deep Convolutional Auto Encoder
5	Age-Voice Conversion System Driven By Multi-Parameter
6	Nonparallel-Corpus-Based Multi Speaker Voice Conversion
7	Emotional Voice Analysis And Conversion Based On Parallel Corpus
8	Research On Methods For Voice Covnersion
9	Studies On Key Techniques For Voice Conversion
10	Emotional Voice Conversion Based On StyleGAN With Fundamental Frequency Difference Compensation