Font Size: a A A

Study Of Instance Segmentation Approach Based On Deep Learning

Posted on:2021-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2518306557486724Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the continuous deepening of research in the field of computer vision,instance segmentation has gradually become one of the current research hotspots in the field of computer vision.The goal of instance segmentation task is to divide all the foreground pixels in the image into the instance to which it belongs.In the field of computer vision,traditional methods that rely on manual features have been gradually replaced by deep learning,and instance segmentation methods based on deep learning have gradually become the mainstream solution for instance segmentation tasks.This paper analyzes the existing instance segmentation methods based on deep learning,and proposes a new one-stage fully convolutional anchor-free segmentation network based on the object detection network FCOS and the proposed network achieves comparable performance with existing classic methods.Then,this paper applies the proposed method to real-time human instance segmentation,and achieved good results.The main work of this article is as follows:1.The structure,advantages and disadvantages of the instance segmentation network Mask R-CNN and YOLACT are analyzed.As a representative classic two-stage instance segmentation network,Mask R-CNN add a mask branch on Faster R-CNN to achieve instance segmentation,Mask R-CNN has the advantages of simple network structure design and high segmentation precision(up to 33.6m AP)but its inference speed is directly proportional to the number of regions of interest generated by RPN and can not meet the real-time requirement.Based on a one-stage object detection network to implements instance segmentation,YOLACT generates instance masks by linear combining of prototype masks and mask coefficients and cropping.YOLACT achieves real-time instance segmentation of 29.8m AP with 33 fps,but the method of combining instance masks and mask coefficients is relatively simple to be failure to take advantage of the combination of low-level features and high-level features,which makes the precision of segmentation relatively low.FCOS is a one-stage fully convolutional object detection network with no anchors,which achieves the highest detection precision among the current one-stage object detection network through dense prediction.So,this paper considers to propose a new instance segmentation network based on FCOS.2.Aiming at the problem of low segmentation precision of one-stage instance segmentation network,a new one-stage anchor-free instance segmentation method which generates instance mask by combining the mask coefficient and the region of interest of the prototype mask to obtain the instance mask is proposed based on FCOS and achieves higher segmentation precision.Aiming at the problem of relatively simple instance mask generation method in YOLACT,a prototype mask of “channel-for-area” is proposed.The different channels of the prototype mask are used to represent the information of different positions of the instance mask.In the combining of region of interest and the mask coefficient,a piecewise linear combination method is used,so that the upper and lower branches are better used to represent the instance mask,and a higher segmentation precision is obtained compared to YOLACT.Experiments on the COCO dataset show that the segmentation precision of the proposed instance segmentation method can reach 31.6m AP,surpassing the classic MNC and FCIS,and its segmentation precision also surpasses YOLACT and Polar Mask,which are also one-stage methods.3.Aiming at the problem of real-time human body instance segmentation,based on the proposed general instance segmentation method,the network structure is optimized,thus realtime human body instance segmentation with higher accuracy is achieved.The three proposed structural optimization schemes include: 1)Introduce deep separable convolutions,and replace the most standard two-dimensional convolutions in the head and prototype generation network of the base network model proposed in this article with deep separable convolutions,thereby improving the inference speed of the method;2)Replace the original backbone network which uses Res Net with FPN with a lightweight backbone network which uses Mobile Net V2 with FPN,thereby further improving the inference speed of the algorithm;3)Add a semantic segmentation branch to assist training during training.This branch is not used during testing and application,so that the accuracy of human segmentation can be improved without introducing any additional calculations.The structure-optimized network model can achieve real-time human instance segmentation of 40.3m AP at 26.7fps on the COCO dataset person classification.
Keywords/Search Tags:Computer Vision, Deep Learning, Instance Segmentation, One-stage, Anchor-free
PDF Full Text Request
Related items