| In the rapid development of the Internet era,various intelligent devices will generate massive image data.With the emergence of large-scale manually labeled datasets,deep learning technology has made great breakthroughs in the field of computer vision such as image classification,image super resolution and object detection.However,manually labeling image data is a tedious and time-consuming process.In contrast,unlabeled datasets are cheap and easy to obtain.Therefore,how to effectively use the massive unlabeled data is one of the research hotspots in the field of computer vision.Self-supervised representation learning constructs supervision signals by designing self-supervised pretext tasks,and learns rich semantic representations from unlabeled datasets.However,many existing self-supervised representation learning models usually need a large batch size to learn a good visual representation in the training process,and setting a large batch size often requires a large amount of computing resources.In order to solve the problems in the above self supervised representation learning algorithm,the main contents of this thesis are as follows:(1)Self-supervised representation learning method based on feature decoupling similarity learning and feature differentiation aggregation relation reasoning.On the one hand,self-attention block and cross attention block are introduced to help the model to learn the global semantic information of images and the correlation information between multi-view images;On the other hand,it helps the model to learn more fine-grained semantic information of multi-view images by performing relationship reasoning on the positive and negative sample pairs obtained after differential aggregation of multi-view image features.By learning the decoupled feature similarity learning task and the differentiated aggregation relationship reasoning task at the same time,it can help the model learn more abundant semantic representations in the case of small batch size.Experimental results show that compared with other self-supervised representation learning methods,this method can achieve higher classification accuracy on multiple image classification datasets.(2)We apply self-supervised representation learning to object detection task and propose a self-supervised object detection method based on spatial scale learning and category prediction.Without the need of additional manual labels,help the model to learn the spatial scale relationship and category relationship between objects in the image by introducing the spatial scale information learning task and category prediction task.(3)In the second work,the detection model uses a single scale feature map for object detection,which often fails to adapt to the scale difference of different objects in the image because the information contained in the single feature is not comprehensive.In order to help the model better adapt to the scale difference of different objects in the image,We improve the feature pyramid network,add attention module to the feature pyramid network,and get the feature pyramid network with attention as the feature extractor.The experimental results show that our method can effectively improve the map value of the object detection model. |