With the continuous development of computers,the Internet and big data,multimedia data such as images has exploded.In order to facilitate users to retrieve the required information from massive images,images need to be classified and labeled.However,due to the “semantic gap” between the image and the actual semantics,it is impossible to accurately annotate images only by relying on a single image modality feature.Usually,many images published on the Internet are accompanied by related text information.Based on extensive research on multimodal learning,multimodal technology can be used in combination with graphic and text content to improve the semantic gap of image annotation and improve the accuracy of information retrieval.In addition,in the process of image annotation,many complex images contain more than one object,and multiple label categories of the image need to be identified.However,the objects are not completely independent.There may be some correlation between the labels describing the same picture.The correlation is of great significance for improving the performance of multi-label image classification.In order to improve the accuracy of image annotation and multi-label image classification,the specific research contents of this thesis are as follows:1.An image classification and annotation algorithm based on multimodal deep multiple kernel learning is proposed.In order to more completely and accurately identify the category labels of images,the method combines two modalities of image and text.In order to solve the data heterogeneity problem of multi-modal technology,this thesis uses deep multiple kernel learning to fully integrate the two-modal content.First,the features of the two modalities of image and text are extracted respectively,and multiple single kernel functions in different modalities are obtained;then the deep neural network is used to combine these kernel functions to obtain the final fusion kernel function;finally,it is sent to the Support Vector Machine classifier.In this thesis,the image classification and annotation algorithm of multimodal deep kernel learning is applied to two multimodal datasets Crisis MMD and Pascal VOC2007 with 5 and 20 categories,respectively.The experimental results verify the effectiveness of the algorithm proposed in this thesis.2.A multi-label classification method based on correlation multi-modal deep kernel network is proposed.In order to improve the performance of multi-label image classification,this thesis proposes a label correlation model and introduces it into multi-modal deep kernel network to label multi-label images.First,combine the adjacency matrix of the graph and the distance matrix of the label semantic word vector to obtain the high-order correlation model between the labels;then send the initial multi-label confidence matrix based on the multi-modal deep multi-kernel learning algorithm into the label correlation model;finally,the optimized label score vector is used as the input layer node feature in the graph convolutional neural network,and after the network is trained,the final multi-label classification result is output.This thesis conducts experiments on the multi-label dataset Pascal VOC2007.The experimental results show that the label correlation model proposed in this thesis can effectively improve the performance of multi-label image classification. |