| With the continuous development of science and technology,image processing technology has gradually entered people’s life,such as object detection,face recognition,behavior recognition and so on.As an indispensable task in the field of computer,semantic segmentation plays an important role,which is mainly to assign semantic labels to each pixel in the image.It can not only label the correct category,but also identify the location of the object.At present,semantic segmentation has been broadly applied to the fields of automatic driving,scene analysis,video surveillance.Therefore,how to improve the recognition accuracy and how to efficiently identify the category of each pixel with lower time and space complexity have become two main points in the semantic segmentation task.In this thesis,we mainly research the image semantic segmentation technology based on deep learning.Firstly,the basic principle of deep learning,the convolutional neural network algorithm principle,and the semantic segmentation process of the FCN are introduced in detail.And then,aimed at natural images and remote sensing images respectively,corresponding improvements are made to the backbone network Seg Net.Comprehensive experiments are conducted to verify its performance.Finally,the segmentation performance of the model has been improved significantly.The research work of this thesis is mainly divided into the following contents:Some methods are proposed for natural images:Firstly,in order to solve the problems that information is lost in the backbone model and cannot be recovered by upsampling,this thesis proposes a hierarchical feature fusion method based on the attention mechpaperanism.This method mainly fuses the feature maps from the encoder and the corresponding decoder to supply the upsample information.In addition,attention mechanism is added in the fusion layer to obtain the attention of each pixel,so as to improve the segmentation performance.Then,for the sake of obtaining the context information of different scales in the feature map,a multi-scale feature extraction module based on depth-separable convolution is proposed.It consists of different convolution kernels to extract the information from different ranges.Meanwhile,the depth-separable convolution can reduce the parameters.After that,in order to improve the segmentation accuracy,the two methods are applied to the decoder-encoder structure simultaneously.A dense segmentation network,Dense Seg Net,is proposed.Last but not least,the experiments are carried out to verify each method under the Caffe deep learning framework.The final model attains the performance of 79.3% MIo U on Pascal VOC 2012 dataset.Compared with the backbone model Seg Net,the performance improved by 19.4%.And then the Dense Seg Net network is optimized for remote sensing images:First of all,in order to solve the problem that the categories of remote sensing images are complicated and easily confused,a tree model,Tree Net,based on Dense Seg Net,is proposed.This model contains four branchs,which come from the output of different network depths.In the training,the shallow network is used to extract the features with large gap,and the deep network is used to extract the features with small gap,so as to realize the segmentation from simple category to complex category.Moreover,the original model apply dilation convolution to improve the resolution,but dilation convolution can increase the parameters.Therefore,pyramid upsample method is proposed to reduce the computation and memory occupancy without reducing the accuracy.Finally,based on the problem of class unbalance,the weighted focus cross entropy is used.And the average loss function of each branch in the tree model is taken as the overall loss.The final average F1 score reaches 80.5% on the zhongwei dataset produced in this thesis,and the parameters is reduced by 31.2M. |