Research On Method Of Reconstructing Scene Image From Audio Based On Generative Adversarial Network

Posted on:2022-10-26

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Yan

Full Text:PDF

GTID:2518306524981249

Subject:Systems Engineering

Abstract/Summary:

Sound scene reconstruction is a new information processing method that reconstructs scene images based on the features of sound events in audio signals.It has a wide range of applications in civil fields such as scene perception,security reconnaissance,multimedia analysis,and movie scene production.However,the generative adversarial network(GAN)theory for sound scene reconstruction is imperfect,which leads to unstable training,and the resolution and quality of the reconstructed image need to be improved.Therefore,this thesis researches on the GAN theory and methods of reconstructing scene image from audio,and focuses on attention mechanism,spectrum normalization,UNet,Res Net,and Pyramid Network.The main work of this thesis are as follows:(1)A conditional GAN model is established,and the method of combining log-mel spectrogram and convolutional neural network is used to extract sound features.Then the influence of four loss function models on training stability and image quality is explored,and the classification accuracy is proposed to evaluate the correlation between reconstructed image and audio.(2)An improved conditional GAN model combining attention and spectrum normalization is established.The model adds attention layers to the network structure,and normalizes spectral norms of the convolution kernel parameter matrices of the network.Experiments show that the quality of 64×64 resolution image reconstructed by the model with attention layers is higher,and the training is more stable with the spectrum normalization.On the test set,the classification accuracy of reconstructed images of the improved model increased by 3.9%.(3)A cascaded conditional GAN model with two generators and two discriminators is established,in which the generator architecture uses UNet,the discriminator uses Patch GAN,and the auxiliary classifier uses Res Net.Experiments show that the improved model not only generate 128×128 high resolution sound scene images,but also generates higher quality images.In summary,the improved models based on the conditional GAN of this thesis effectively improves the resolution and quality of the reconstructed image of the sound scene.The experiment results show that the improved model training process is more stable,and the accuracy of image classification has also been greatly improved,which provides an important reference and experimental basis for the application of generative adversarial networks in sound scene reconstruction tasks.

Keywords/Search Tags:

Generative adversarial network, Generator, Attention mechanism, Discriminator, Spectral normalization

Related items

1	Research On Expression Synthesis Algorithm Based On Generative Adversarial Networ
2	Research On Image Style Transfer Method Based On Generative Adversarial Network
3	Research On Image Inpainting Method Based On Generative Adversarial Network
4	Research On The Construction Of Complex Generative Adversarial Network
5	Research On Image Inpainting Algorithm Based On Generative Adversarial Network
6	Image Inpainting Based On Generative Adversarial Networks
7	Research On Image Steganography Based On Generative Adversarial Networks
8	Research On Adversarial Attack On Steganography Based On Dual-discriminator Generative Adversarial Network
9	Research On Generative Adversarial Network For Text-to-Image Synthesis
10	Research On Image Data Augmentation Based On Generative Adversarial Network