| The detection of sea surface ship targets based on visible images has important application value in the fields of sea area management,territorial defense,ship rescue,and other related areas.However,performing fine-grained detection and recognition of ships on the sea surface is challenging due to the large intra-class differences and small inter-class differences between samples caused by differences in the azimuth angle of the same type of ship and the similarity in the structure of different types of ships.Therefore,improving the precision of fine-grained ship detection remains an important and urgent issue to be addressed.General object detection based on deep learning is the current mainstream method,it is necessary to comprehensively consider the construction of datasets and the optimization design of detection models to improve detection performance.However,the existing ship target detection and recognition datasets are mainly composed of remote sensing images from a top-down view,and the lack of available datasets for visible light ship detection from a horizontal view,especially those that can meet practical task requirements,cover different angle ranges,and have fine-grained category annotations,poses a significant challenge in dataset construction.This paper focuses on addressing the above issues,and the main work contents are as follows:(1)To address the problem of lacking relevant visible ship datasets,a visible light ship detection dataset FSC-5 was constructed based on computer simulation technology and self-developed image annotation software to obtain simulation images and real images from various angles and scenes.To tackle the object classification task,the objects in the dataset were cropped based on VOC annotation information to obtain the FSC-5C for classification.Considering requirement of high accuracy for fine-grained vision classification task,a sea ship classification model based on domain adaptation,named as TransDAC,was proposed under the strategy of detection-then-classification.First,the visual transformer model was used as backbone network to extract features,and its excellent ability in fine-grained feature extraction was verified through experiments;Second,a domain adaptation training strategy was used,where simulation images and real images were taken as source domain and target domain respectively,and the alignment of the two domains was achieved by using local maximum mean difference.Furthermore,the contrastive loss function was adopted to minimize the similarity of different categories and maximize the similarity of the same categories.Finally,the ablation experiment was carried out on the FSC-5C.The experimental results showed that,the overall accuracy of the proposed fine-grained ship classification model based on domain adaptation reached 96.0%,which was superior other experimental methods and fine-grained models on the FSC-5C.(2)Considering the real-time problem of the method of detection-thenclassification,and the fact that simulation images cannot be obtained under some conditions,an end-to-end fine-grained ship detection model based on improved YOLOv7 was proposed.First,Swin Transformer was used to replace the backbone network of YOLOv7,where remote feature information was built by constructing global attention to improve the quality of the feature extraction module;Second,a multi-stage attention module based on channel attention and spatial attention,named as MAMCS,was introduced into feature fusion module to solve the problem of Swin Transformer’s low detection performance on small targets;Finally,the location regression and category classification of the predicted output were decoupled through DHCSP,whose classification branch was a CSP structure,to improve the classification performance for fine-grained categories,thereby improving model detection performance.The performance of the proposed model was tested on the ShipRSImageNet,Seaships,and FSC-5 dataset,respectively.The experimental results showed that,mean average precision of the proposed model on three datasets increased by 2.4%,1.0%,and 4.0%,respectively,compared with the benchmark value,which verified the effectiveness of the proposed model. |