| Object detection and instance segmentation are two classic tasks of computer vision,which play an important role in scene text detection,license plate detection,pedestrian detection,remote sensing detection and auto-driving.Object detection needs to recognize all objects within the given categories in an image,and uses bounding boxes to locate the objects.Instance segmentation uses the pixel-level masks to label every instance.In recent years,with the rapid development of GPU and convolutional neural network(CNN),deep learning has become the mainstream method in many computer vision fields,including object detection.This paper will focus on improving the core components of the two tasks,including the bounding box regression loss function,as well as the shortcomings of widely used Non-Maximum Suppression(NMS)in the postprocessing.Specifically,bounding box regression is the crucial step in object detection.In ex-isting methods,L_n-norm loss is widely adopted for bounding box regression,while it is not tailored to the evaluation metric,i.e.,Intersection over Union(IoU).Recently,IoU loss and generalized IoU(GIoU)loss have been proposed to benefit the IoU metric,but still suffer from the problems of slow convergence and inaccurate regression.In this paper,we propose a Distance-IoU(DIoU)loss by incorporating the normalized distance between the predicted box and the target box,which converges much faster than IoU and GIoU losses.Furthermore,this paper summarizes three geometric factors in bounding box regression,i.e.,overlap area,central point distance and aspect ratio,based on which a Complete IoU(CIoU)loss is proposed,thereby leading to faster convergence and better performance.By incorporating DIoU and CIoU losses into state-of-the-art models,e.g.,YOLOv3,SSD,Faster R-CNN,YOLACT and Blend Mask-RT,we achieve notable performance gains.For the post-processing,the most widely used NMS is a greedy algorithm,which uses IoU as the criterion,and determines a box is reserved or suppressed one by one in a se-quential processing manner.It faces two problems:First,the use of IoU as a criterion is not suitable for occlusion cases.Second,sequential processing is an extremely time-consuming method that cannot fully utilize the GPU for acceleration.To solve the first problem,this paper proposes DIoU-NMS which use DIoU as the criterion instead of IoU.For the second problem,this paper proposes Cluster-NMS,a GPU-accelerated NMS realized by matrix-matrix operations,which greatly improves the processing speed of NMS,and the geometric factor can also be easily incorporated into Cluster-NMS to further improve AP and AR.Extensive experiments have proved the effectiveness of the proposed methods. |