Font Size: a A A

Research On Real-Time Image Semantic Segmentation Using Deep Learning

Posted on:2021-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:Saqib MamoonFull Text:PDF
GTID:2518306512992359Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Semantic segmentation plays an important role in the interpretation of images.It is a collective task of image classification,detection and localization.Classification ensures that each picture is classified as the same category.Detection refers to the localization and identification of artifacts.Image segmentation can be treated as pixel-level prediction because it classifies each pixel into its category.In the past,semantic segmentation also has a lot of computer vision and machine learning approaches,and most of them focus on hand-engineered technologies that independently identify pixels.The segmentation performance has been significantly enhanced since the re-emergence of Deep Neural Network(DNN).Although Deep Neural Networks(DNNs)have achieved great success in semantic segmentation tasks,it is still challenging for real-time application.A large number of feature channels,parameters,and floating points(FLOPs)make the network sluggish and computationally heavy,which is not desirable for real-time tasks such as in robotics and autonomous driving.Designing a lightweight semantic segmentation network often requires researchers to find a trade-off between performance and speed,which is always empirical due to the limited interpretability of neural networks.In this research we investigate and propose a light-weight and fast segmentation network named stage pooling semantic segmentation network(SPSSN)using deep neural networks.SPSSN efficiently reuse the paramount features from early and intermediate layers at multiple stages in the network,at different spatial-resolutions.The network consist of three major parts,a deep branch,a shallow branch and four stage pooling modules.Deep branch extracts the features and downsample the image upto 1/32 times of the original resolution,and shallow branch operates at high spatial resolution to refine the spatial details.The stage pooling module takes the input from the deep branch,extract the learned features at different spatial resolutions,the output of each module is then introduced to the shallow branch.We simply perform element-wise addition to fuse the features of all modules to keep the model light-weight.In order to capture more contextual information in the image,spatial pyramid pooling is used after the deep branch.The image resolution for segmentation tasks are of utmost importance.Therefore,SPSSN network takes input at full resolution 2048x1024 px,uses only 1.42 M parameters and yields 69.4% m Io U overall class accuracy without pre-training,and obtains inference speed of 59 frames per second on the cityscapes dataset.In contrast with overall class accuracy,SPSSN achieves 86.4% categorical accuracy.We achieved highresults of 64.3% on Cam Vid,with 105 FPS inference speed.The SPSSN is able to run directly on mobile devices in real-time,due to its lightweight architecture.Furthermore,we propose stream pooling module which not only take the input from deep branch but also takes input from the previous stream pooling module.Unlike stage pooling modules,only the output of last module is fused into the shallow branch in stream pooling.Finally,to demonstrate the effectiveness of the proposed network,we compared our results with state-of-the-art networks.
Keywords/Search Tags:Semantic Segmentation, Feature reuse, Stage Pooling, Stream Pooling, Realtime segmentation
PDF Full Text Request
Related items