| Object detection is a basic task in the field of computer vision and has been widely used in autonomous driving,video surveillance,precision medicine,and other fields.The performance of the object detection model is mainly evaluated by detection accuracy and detection speed.Existing object detection methods usually employ fixed structures and hyper-parameters and consider less of using the object attributes of hard samples.To improve the detection accuracy,existing methods increase the complexity of the structure to fit the task function in a higher dimensional feature space,strengthening the classification and regression for the objects that are difficult to detect in the current low dimensional feature space.Usually,the 2%~5%improvement in detection accuracy needs adding 30%~150%times of computation complexity,significantly delaying the detection speed.Hence,the way of adding the computation complexity and parameters to improve the detection accuracy can not meet the requirements of many practical applications such as autonomous driving and drone monitoring which have high requirements for both detection accuracy and speed.So,an important research question is how to better fit the task function in the current dimensional feature space to improve the detection performance on hard images.The challenge of this question is how to fully use the object attributes of individual samples to adaptively reconstruct the model structure and decide hyper-parameters on condition that the object attributes are unknown and difficult to obtain at the inference stage,Observing this,this paper proposes designing adaptive object detection models with image object attributes for three important modules of object detection models,including the neck network,the region proposal network,and the detection head.The proposed methods predict and encode the object attributes at the inference stage and adaptively design the structure and hyper-parameters with them,improving the detection accuracy with a slight increase in computation complexity.The study is mainly threefold.(1)Due to the fixed structure of the neck network,the feature fusion strategy remains fixed.The fixed fusion strategy cannot provide enough high-level semantic information for individual images containing the specific distribution of object sizes,making it difficult to achieve optimal detection performance on individual images.Therefore,this paper proposes the object size adaptive feature fusion strategy of the neck network that encodes the object size information in the inference stage and constructs a fusion matrix to adaptively reconstruct the fusion strategy according to the encoded information for each image,allowing more high-level semantic information contained in the output feature maps to be fed into the layers corresponding to specific object sizes in the individual image.Experimental results show that the proposed strategy improves 1.2 AP@50 on the PASCAL VOC dataset with a slight latency increase.(2)The fixed number of proposals produced by RPN can not change with the object number of individual images,producing many false-positive prediction boxes on images with fewer objects and increasing missed detection on images with more objects.Therefore,the paper proposes an object count adaptive mechanism of determining the number of proposals that designs a foreground region prediction module to predict the number of foreground regions during the inference stage to estimate the object count and reduces the number of proposals based on the object count of the current sample.It ensures individual images have fewer false positives without increasing the missing rate,achieving higher detection accuracy across images with varying object counts.Experimental results show that the mechanism both improves 1.4 AP@50 on the PASCAL VOC dataset while increasing the detection speed by 1.8 times.(3)Existing corner-based detection head networks predict and match corners to form prediction boxes.They simply take the average score of the corners as the confidence of the prediction box,resulting in mismatched prediction boxes receiving high confidence scores due to the high scores of some corners and becoming false-positive detection boxes.Therefore,the paper proposes an object key-point adaptive method of calibrating the confidence of prediction boxes that uses the score differences and ID differences of matched corners,suppressing the confidence of mismatched boxes,and thereby enhancing the detection accuracy of the model.Experimental results show that the proposed method improves 3.1 AP@50 on the MS-COCO dataset with no latency increase.In conclusion,this paper mainly focuses on how to use the object attributes in individual images,including object size,object count,and corner information,to adaptively reconstruct the model structure and determine hyperparameters during the inference stage,in order to better fit the task function in the low-dimensional feature space.This effectively overcomes the problem of a fixed model structure that cannot simultaneously improve both accuracy and speed. |