| Excellent fish species resources are the basis for the sustainable development of fishery and aquaculture.Breeding and optimization are the key links in the process of fish breeding.In the optimization of fish fry,phenotypic data is a very important reference index,and efficient and accurate detection of key points is a decisive factor in the measurement of phenotypic data.In the non-contact method,many fish key point detection methods of machine vision are too complicated and have limited performance,while the key point detection method based on deep learning has the advantages of high precision,end-to-end characteristics,high efficiency,simplicity and easy maintenance of the system.This thesis takes yellow croaker,perch and crucian carp as the main research object,and designs an accurate and efficient detection algorithm of fish key points based on Transformer.The main research contents and contributions of this thesis are as follows:1.The image data sets of three kinds of fish are constructed,the key point positions of fish are defined,and a lightweight key point detection method of fish body based on Transformer is proposed.According to the requirements of phenotypic data determination,10 types of key points of three fishes with similar body types,namely yellow croaker,perch and crucian,are defined.A total of 4907 pictures of three kinds of fish are collected in a fixed environment,and the processes of data preprocessing,labeling,cleaning and division are carried out.The constructed data set serve as the algorithm evaluation of this thesis.Trans Pose,the lightweight Transformer model is transplanted into the detection of fish key points and gets better results than the traditional CNN algorithm,but the parameter numbers and calculation quantity are significantly lower than several traditional CNN algorithms.2.Two lightweight structures based on self-attention mechanism are explored,and a hybrid channel space lightweight self-attention Transformer model is proposed.SA-SDC,a spatial self-attention mechanism with self-adaptive locality,and Simple-MHCSA,a simple channel self-attention mechanism,are designed.Then,according to the idea of mixed attention,a lightweight self-attention Transformer model of mixed channel space is optimized and designed based on these two mechanisms.Compared with the baseline model,the proposed model significantly reduces the video memory occupation,reasoning time and training time of the model at high resolution.At a 384×512 resolution,compared with the baseline model,the frame rate of the model is increased by 30%,the video memory occupation is reduced by 70%,and the training time is reduced by 9.7%.The model can keep the detection accuracy of the key points of the fish body basically unchanged while reducing the calculation amount by 36.1%.3.In order to improve the detection accuracy of key points,a CNN-Transformer model with interactive feature reinforcement is proposed.The model includes a hourglass encoder-decoder network structure,a multi-scale fusion high-resolution heat map output module and a high-level semantic multi-scale feature module FPN-with-Encoders,which significantly improves the overall performance of key point prediction.Compared with the baseline model,this model improves by 2.5AP and 3.1AR in the full test set,which is higher than other mainstream comparison algorithms.In addition,this thesis also constructs a complex test set for evaluation,which proves that the model can better cope with the data in different situations.Finally,a generalization experiment of gradient data increment is designed,and the results show that the model has better cross-species generalization and data utilization efficiency than HRNet. |