| Instance segmentation aims to segment instance regions from images and distinguish different individuals for subsequent visual tasks.In recent years,with the development of convolutional neural network,instance segmentation has been widely used in automatic driving,security monitoring and other aspects,and has become a hot research topic in the field of computer vision.This paper mainly focuses on the study of human,a special group in case segmentation task.At present,in the field of human instance segmentation,occlusion between human instance,rough segmentation boundary,and lack of data resources still restrict the improvement of segmentation accuracy.However,from the perspective of practical application,the high complexity of the existing network often leads to a large memory consumption,which makes it difficult to achieve a balance between speed and accuracy.In view of the above problems,this paper puts forward three solutions:(1)For the occlusion problem between instances,a human instance segmentation model based on two-stream convolution neural network is proposed.For the input images,different network structures are used to extract human pose features and context features.Then,a feature fusion block is designed to dynamically adjust the weight of pose stream and context stream by gradient back propagation,and the fused features are obtained by weighted fusion.Finally,the fused features are input into the segmentation module to obtain the segmentation result.Experimental results on two public datasets show that this method can effectively alleviate the problem that the segmentation accuracy decreases due to the occlusion between human instances.(2)For the problem of rough segmentation boundary,a human instance segmentation network based on image edge refinement is proposed.In order to alleviate the loss of edge information caused by multiple upsampling,an edge refinement module is introduced based on the fusion of human pose information.The edge refinement module mainly consists of three steps: firstly,N edge pixels with the highest uncertainty of prediction labels are selected on the coarse-grained feature map by random sampling method.Secondly,for the selected points,the corresponding feature vectors are predicted and fused on the coarsegrained feature map and fine-grained feature map respectively to obtain the point feature representation.Finally,the point feature is used to represent the input point prediction network for refined prediction.Through iterative upsampling of coarse-grained feature maps,the above process is repeated until the mask of the original image size is obtained.This method can improve the segmentation accuracy of image edge.(3)For slow inference and insufficient original datasets,a human instance segmentation network based on Transformer data augment and D-YOLACT is proposed.The network framework is divided into two parts.First,a large amount of unlabeled data is used to train the Transformer network.After the network is basically formed,a small amount of labeled data is used to optimize the network and enhance the data through data reconstruction to solve the problem of insufficient original data in severe occlusion scenarios.In the second part,a simple fully convolution segmentation model D-YOLACT is used.In order to expand the receptive field,the 3 × 3 convolution in the residual block of the last ten layers of the backbone network ResNet is replaced by a deformable convolution,so that the model can improve the segmentation accuracy while ensuring the speed. |