| Image semantic segmentation is a preprocessing method for computer vision tasks,and it is widely used in areas such as autonomous driving,scene parsing,and robotics.With the rapid development of deep learning,image semantic segmentation methods often require a deeper neural network to achieve high accuracy,which makes the segmentation model driven by big data complicated,not only time-consuming to train,but also requires a lot of GPU resources,and requires high-cost data annotation.To reduce the training cost,methods based on weakly supervised learning have attracted more and more attention.Among them,the methods using image-level annotations as supervised information require lower data annotation costs.However,due to its weak supervision information,it is difficult for existing methods to achieve good results.To improve the accuracy of image semantic segmentation,this paper mainly focuses on the problems of complex model,slow inference speed,inability to segment specific targets,and high cost of training data annotation in image semantic segmentation.By analyzing the advantages and disadvantages of existing image semantic segmentation algorithms,we explore how to build a lightweight semantic segmentation model,improve the accuracy of image semantic segmentation,and complete the segmentation task under the training of weakly supervised learning.The research contents of this paper are as follows:(1)Aiming at the problems of the complex model of the existing algorithms and the large GPU resources occupied by training,this paper proposes an image semantic segmentation algorithm based on multi-scale feature fusion.The algorithm extracts image features by using lightweight shallow convolution and optimizes the fused multiscale features by using attention mechanism,and finally obtains the semantic segmentation result.Experiments show that the algorithm in this paper achieves high accuracy and comparable speed when only a single GPU is used.(2)In order to solve the problem that the algorithm(1)can segment the image panorama,but cannot segment the specific target,this paper proposes an image semantic segmentation algorithm based on multimodal feature fusion.The algorithm combines the two tasks of image segmentation and object detection based on text representation and optimizes the loss function to make the two tasks fit the entire segmentation model.Experiments show that the proposed algorithm has better realtime performance and higher accuracy than existing methods.(3)Aiming at the problems that algorithms(1)and(2)require many pixel-level annotations during training,and these data are difficult to obtain,to reduce the cost of data annotation,in the training environment where only image-level annotations are used,this paper proposes a weakly supervised image semantic segmentation algorithm with iterative dCRF(Dense Conditional Random Field).The algorithm uses a graph convolutional network for feature propagation to obtain initial segmentation,and then iteratively performs dCRF to obtain fine segmentation.After experimental comparison,the algorithm in this paper is better than the method that only uses a single dCRF,and the obtained image semantic segmentation results are more refined. |