Image semantic segmentation is an important branch in the field of computer vision.Its main purpose is to divide images according to different semantic categories.With the rapid development of artificial intelligence,image semantic segmentation technology has been widely used in many fields.Many emerging applications,such as automatic driving and medical image analysis,increasingly rely on accurate and high-speed image segmentation technology.The main contents of this paper are as follows:(1)Semantic segmentation based on traditional methods is characterized by complex programs and slow speed,while semantic segmentation based on convolutional neural network often has inaccurate segmentation results.Aiming at the shortcomings of the above two methods,an image semantic segmentation method based on candidate region generation and improved depth residual network is proposed in this paper.Firstly,a set of target candidate regions is generated in the image to be segmented by using the grouping suggestion method.Then,different scale expansion convolution is used to improve resnet-50 depth residual network to increase the visual perception domain without increasing the amount of calculation,so as to extract the visual features of the image candidate region.Finally,the feature map of each candidate region is averaged by accumulating all pixel values and the softmax classification function is used to activate the prediction region category.Experimental results show that this method is superior to other traditional segmentation methods in classification accuracy.(2)In order to adapt to different sizes of candidate regions and fully retain the details of the corresponding regions,an image semantic segmentation method based on multi-scale candidate region features and multi model fusion is proposed in this paper.Firstly,the global visual feature map of the original image is extracted based on the improved depth residual network of extended convolution with different scales.Then,an image candidate region including region size,position and foreground mask is generated by the selection search method.Secondly,using the obtained global feature map and image candidate regions,the visual features of candidate regions with fixed size are extracted by ROI pooling method;The foreground visual features of the candidate region are obtained by multiplying each channel of the candidate region features by the corresponding foreground mask,and spliced and fused with the candidate region visual features to obtain the visual features of a single model.Then,different image candidate region sizes are set to train multiple single model feature extraction networks.Finally,for the features obtained by multiple single model extraction networks,the mean method and voting method based on neural network ensemble learning are used to realize feature fusion,and the softmax classification function is used to activate the prediction region category.The experimental results show that the multi model method proposed in this paper is superior to the single model method in segmentation accuracy.(3)In order to balance the contradiction between segmentation accuracy and computational complexity of image semantic segmentation network,an image semantic segmentation method based on multi-scale feature space attention fusion is proposed in the framework of transformer coding decoding.Firstly,in the coding stage,the Xception network sampled at 16 steps is used as the backbone network to extract the multi-scale bottom visual features,and the multi-scale bottom visual features are fused by 1×1 convolution to obtain high-level visual features.Then,in the decoding stage,the multi feature spatial attention aggregation strategy is used to select the underlying visual features extracted in three different stages of the backbone network,and splice and fuse them with advanced visual features.Experimental results show that compared with the traditional image semantic segmentation methods,the proposed method has higher segmentation accuracy and less computation. |