Font Size: a A A

Research On Method Of Reconstructing Scene Image From Audio Based On Generative Adversarial Network

Posted on:2022-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y C YanFull Text:PDF
GTID:2518306524981249Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Sound scene reconstruction is a new information processing method that reconstructs scene images based on the features of sound events in audio signals.It has a wide range of applications in civil fields such as scene perception,security reconnaissance,multimedia analysis,and movie scene production.However,the generative adversarial network(GAN)theory for sound scene reconstruction is imperfect,which leads to unstable training,and the resolution and quality of the reconstructed image need to be improved.Therefore,this thesis researches on the GAN theory and methods of reconstructing scene image from audio,and focuses on attention mechanism,spectrum normalization,UNet,Res Net,and Pyramid Network.The main work of this thesis are as follows:(1)A conditional GAN model is established,and the method of combining log-mel spectrogram and convolutional neural network is used to extract sound features.Then the influence of four loss function models on training stability and image quality is explored,and the classification accuracy is proposed to evaluate the correlation between reconstructed image and audio.(2)An improved conditional GAN model combining attention and spectrum normalization is established.The model adds attention layers to the network structure,and normalizes spectral norms of the convolution kernel parameter matrices of the network.Experiments show that the quality of 64×64 resolution image reconstructed by the model with attention layers is higher,and the training is more stable with the spectrum normalization.On the test set,the classification accuracy of reconstructed images of the improved model increased by 3.9%.(3)A cascaded conditional GAN model with two generators and two discriminators is established,in which the generator architecture uses UNet,the discriminator uses Patch GAN,and the auxiliary classifier uses Res Net.Experiments show that the improved model not only generate 128×128 high resolution sound scene images,but also generates higher quality images.In summary,the improved models based on the conditional GAN of this thesis effectively improves the resolution and quality of the reconstructed image of the sound scene.The experiment results show that the improved model training process is more stable,and the accuracy of image classification has also been greatly improved,which provides an important reference and experimental basis for the application of generative adversarial networks in sound scene reconstruction tasks.
Keywords/Search Tags:Generative adversarial network, Generator, Attention mechanism, Discriminator, Spectral normalization
PDF Full Text Request
Related items