Image is an important carrier of information interaction between people and the outside world.Since images in real life are often complicated,with multi-semantic and multi-label characteristics,multi-label image classification task is the focus of current image classification research.Although many multi-label image classification algorithms based on deep learning have achieved some success,the task still faces three major challenges:First of all,the image contains multiple objects of different sizes,which makes the objects of small size easily ignored in the classification task.Secondly,occlusion phenomenon exists in the image,which weakens the performance of the classification network.Thirdly,since images usually correspond to multiple categories,the output space grows exponentially.Therefore,this thesis proposes two multi-label image classification methods to solve the current difficulties,which are mainly reflected in the following three aspects:(1)For the multi-label image classification task,this thesis studies the classical correlation algorithm,summarizes the defects of the correlation technology,and realizes the innovation on the basis of the current algorithm.(2)Aiming at the characteristics of multi-label images containing multiple semantic objects,a multi-branch integrated multi-label image classification method is proposed.This method makes full use of the features of different layers of Convolutional Neural Network(CNN)with different characteristics,so that different features can deal with objects with different requirements,and alleviate the problem that small-size objects are easily ignored.On the one hand,the high-level features of CNN network have the characteristics of strong semantics and weak details,while the low-level features have the characteristics of strong details and weak semantics,branches are derived from different layers.On the basis of not changing the original features,other branch features are integrated with them to add more feature information,and the global information is injected through the attention mechanism to further extract features.On the other hand,multiple branches perform category prediction respectively,and then select the branch with the best effect for each category through the optimal selection module,and take the result of the branch as the final prediction result of the network for this category.Experimental results show that this method can effectively improve the accuracy of network.(3)A hybrid network multi-label image classification method based on CNN and Transformer is proposed.On the basis of the different features that can be extracted from different layers of CNN network,Transformer model is combined with it.On the one hand,the multi-head attention mechanism in transformer model has better recognition effect for objects with occlusion and viewpoint change.On the other hand,cross attention built in Decoder is used to realize adaptive feature extraction.At the same time,we can learn label embedding directly from the data to capture the relationship between labels without graph network,so as to alleviate the problem of exponential growth of output space.They complement each other through information interaction so as to extract more powerful features.Experiments show that this method can effectively improve the classification accuracy of the network. |