The field of machine vision has been granted a plethora of fresh prospects,particularly in terms of image segmentation,target detection,and comprehension,due to the swift advancement of deep learning technology in recent years.Image segmentation,as a grassroots task to support image processing,has a wide range of interest in academia and industry.And one of them is panoptic segmentation,an emerging scene understanding technique for identifying and separating different objects and scene elements in panoptic images.It has a wide range of applications and can help vision systems in the fields of autonomous driving,augmented reality,robot vision,smart home,etc.to recognize scene information more accurately and efficiently.Previously,most of the panoptic segmentation models in this work used a two-branch framework,and the models constructed based on this framework were divided into two-stage models and single-stage models.The inference speed of the single-stage model is closer to the actual application requirements,but the lack of information fusion methods for multi-scale features in the single-stage model leads to the lack of rich contextual information in the subsequent computation,which affects the ability of the model to perceive the contours of objects and to accurately identify small-sized objects.There is also a part of research work that uses the Transformer structure based on the self-attentive mechanism to build a framework for panoptic segmentation models.However,these models suffer from the problem of insufficient construction of information encoding on large scale feature maps,resulting in poor mask coverage of the output,which drags down the performance of the whole model.In response to the above emerged problems,this thesis improves the shortcomings in the current panoptic segmentation models,and the main research as well as contributions are as follows.(1)This thesis delves into the single-stage panoptic segmentation model,proposing a method based on multiple feature fusion structure to provide a feature map with multi-scale information for the decoding of panoptic segmentation and to enrich contextual information.This is due to the significance of multi-scale features in comprehensive scene understanding.At the same time,an adaptive weighted void space pooling pyramid structure is introduced to enhance the segmentation performance of the single-stage model while ensuring that the computational efficiency is not greatly affected.Specifically,the model’s ability to resist light source interference,object contour perception,and sensing ability of objects with smaller pixel occupancy is enhanced.(2)To address the problem that the overall performance of the panoptic segmentation model built based on Transformer structure is difficult due to the poor quality of the output mask,this thesis first generates a set of pyramid structure feature maps with global pixel correlation information using an improved pyramid structure visual feature extraction model,and then uses a simple coding computational unit,i.e.,the residual expansion excitation module,proposed in this thesis.Some additional special coding is performed on the high-resolution feature maps in this set of feature maps to improve the overall segmentation performance of the model by providing sufficient pixel correlation information for the mask output while avoiding high computational costs.In particular,the model becomes more resistant to the influence of natural environment and can accurately segment natural scenes with high prediction difficulty.The proposed method was tested on two mainstream natural scene datasets,Cityscapes and MS COCO.After several experimental comparisons,the improved method achieves stable improvement in most evaluation metrics.In addition,the comparison of the visualization results in this thesis also confirms that the method in this thesis can show significant robustness in the face of different complex environmental disturbances. |