Font Size: a A A

Semantic Image Segmentation Method Based On Deep Learning

Posted on:2019-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2348330563954468Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Semantic image segmentation has been an important research topic in the community of computer vision and deep learning.There are three challenges in the application of DCNNs to semantic segmentation:(1)reduced feature resolution and loss of spatial information caused by down-sampling operation,(2)difficulties to deal with objects at multiple scales due to the fixed field-of-view,(3)lack of methods to capture global contexts.To provide some solutions on these issues,this paper make improvements based on related works:(1)Because of the local receptive field of convolution,FCN for semantic segmentation cannot directly model the contexts dependence between distant pixels.On the bacis of application of RNN on image classification,we extended LSTM to 2-dimentional image by sweeping across the pixels grid in vertical and horizontal directions to obtain sequence which is the input of LSTM network.This spatial LSTM network can directly capture global contexts and was evaluated on CamVid dataset.Furthermore,we integrated spatial LSTM with FCN,where the FCN fused intermediate layer feature to fit with Multiscale object and the spatial LSTM modeled the global reliance between pixels.(2)On the side of reduced resolution of DCNN,we proposed cascade dilated convolution in use of dilated convolution that can maintain feature resolution and field-of-view at the same time.We used different dilated rates that not be multiple for each other to solve the checkboard problem in cascade dilated convolution.On the other side of existence of multiple objects,based on pyramid pooling and atrous spatial pyramid pooling,we proposed a parallel multiscale module through adding a 1×1 convolution and pooling layers with different strides.The 1×1 convolution can retain feature information of previous layers and pooling layers can obtain global information at different scale.Inspired by encoder-decoder structure,we designed a simple decoder module that upsampled feature map in two steps by fusing low-layer features.Benefitting from these modifications,the model that integrated contexts and multiscale information achieve competitive results on PASCAL VOC 2012 validation set.
Keywords/Search Tags:Semantic Segmentation, Deep Learning, LSTM, Dilated Convolution, Pyramid Pooling
PDF Full Text Request
Related items