| Panoptic segmentation task divides and labels the whole picture containing multiple objects or scene information according to the semantic information of the objects or scenes,classifies all the pixels in the picture and determines the category to which each pixel belongs.Its main purpose is to divide a large panoptic image into different regions,so as to better understand and analyze different elements in the scene.It is often used in virtual reality,augmented reality,autonomous driving and environmental monitoring.Most of the existing panoptic segmentation algorithms have the problems of high computational cost and insufficient accuracy.Efficient PS network provides a solution,but there is still room for improvement in performance.Therefore,a panoptic segmentation network based on multi-scale feature enhancement is proposed to improve the panoptic segmentation effect.The backbone network part: on the one hand,the residual network with recursive layer aggregation structure is used to better reuse the features extracted from the shallow network without increasing redundancy,and it has better ability to learn the structural information in the image;On the other hand,the channel diversification module is added after the two-way feature pyramid network structure,which makes up for the problem that the convolution network focuses on a few main channel features with the deepening of layers,and enhances the ability of the backbone network to extract features.The first part of semantic segmentation: add the branch of jump connection and global attention module to make the extracted features related to global information.Experiments show that the panoptic segmentation quality of this network is improved by 0.9% compared with Efficient PS on Cityscapes dataset.At the same time,the segmentation accuracy of foreground object and background filling area is improved by 0.5% and 1.3% respectively.Compared with Efficient PS,the panoptic segmentation quality of this network is improved by 0.8% on KITTI dataset.Transformer model,which has achieved great success in the application of natural language processing,has been introduced into the field of images and brought great progress to this field.However,while the performance of the model is improved,there are some problems in the image processing of Transformer structure,such as high computational complexity,ignoring the two-dimensional structure of the image and lack of channel adaptability.On the basis of the single network structure Panoptic Segformer model based on Transformer,a multi-scale large convolution kernel attention mechanism is designed,which retains the advantages of combining convolution and self-attention mechanism,and avoids ignoring the influence of the two-dimensional structure of the image on segmentation when the self-attention mechanism is applied to computer vision tasks,thus reducing the model parameters and calculation amount.Experiments show that the panoptic segmentation quality of this network is improved by 1.1% compared with that of Panoptic Segformer on COCO dataset. |