Research On Feature Extraction And Fusion Of Audio Visual Information

Posted on:2022-08-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Jiang

Full Text:PDF

GTID:2518306524481254

Subject:Systems Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of artificial intelligence technology,the methods of using image or sound information to represent the target are increasing.Due to the diversity and complexity of information in the physical environment of the target,it is difficult to fully represent the perceptual target using visual or auditory information alone.Therefore,this thesis focuses on the research of feature extraction and fusion method of visual and auditory information to realize the comprehensive processing,fusion and perception of visual and auditory information of target in low SNR environment.The main research work is as followsFirstly,this thesis establishes a data set containing 900 second sound and 1150 pictures.According to the noise interference and abnormal gain in the actual environment,the initial data set is amplified by setting signal-to-noise ratio,gain and other parameters,and the amplified data set containing 9955 second audio and 12595 pictures is obtained.Then,this thesis analyzes the influence of the order of residual structure on the network performance,and proposes an auditory information feature extraction model based on improved residual structure and a visual information feature extraction model based on multilayer convolution neural network.For the proposed model,classification experiments are carried out on the public data sets esc-50,cifar-10 and the test set established in this thesis,and compared with the pre-training models vggish and vgg19,which proves the effectiveness of the feature extraction model.Then,based on the theory of model fusion,feature stitching and correspondence autoencoder,an improved model of audio-visual information fusion based on correspondence autoencoder is proposed.The model adds the hidden layer Association loss of audio-visual information on the basis of autoencoder,so as to obtain the hidden layer representation of audio-visual information,the regularization term is added to the loss function to avoid the over fitting tendency of the hidden layer representation of visual and auditory information and keep the availability of the hidden layer information.Finally,the F1 score evaluation index and t-sne evaluation method are used to evaluate and analyze the experimental results of the above feature extraction and fusion methods.The results show that the highest accuracy of target recognition is 47.5%,and the highest F1 score is 0.407;When only using visual information representation,the highest accuracy of target recognition is 60.8%,and the highest F1 score is 0.611;When using the audio-visual information fusion method based on correspondence autoencoder,the target recognition accuracy is 84.2%,and the F1 score is 0.846,which is at least 23.4%higher than that of using visual or auditory information representation alone,and the F1 score is at least 0.235 higher than that of using visual or auditory information representation alone,which effectively improves the target perception performance in low SNR environment.

Keywords/Search Tags:

Object Perception, Feature Extraction, Audio-Visual Fusion, Neural Network, Correspondence Autoencoder

PDF Full Text Request

Related items

1	Audiovisual Information Fusion Method Based On Perception Alternative
2	Research On Visual Perception Models And Coding Algorithms
3	Research On Emotion Recognition Technology Based On Audio And Visual Perception System
4	Research On Application Of Convolutional Neural Network In Object Detection Algorithm
5	Research On Audio Visual Fusion Speech Separation Method For Multi-person Dialogue Robot
6	The Research Of Object Recognition Method Based On Visual Cortex Perception Models
7	Visual Saliency Modeling Based On Object Perception
8	Feature Extraction Based On Biological Visual Cognition And Its Applications On Computer Vision
9	Research Of Deep Feature Extraction Algorithm For ECG Identification
10	Research On Image Fusion Algorithm With Multi-scale Transform Domain And Visual Perception