Font Size: a A A

Research On Real-Time Semantic Segmentation For Traffic Scene Of Autonomous Unmanned System

Posted on:2021-02-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z G YangFull Text:PDF
GTID:1482306122479394Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence(AI),most developed countries have launched their national plan for promoting the development of AI.As an important part of the AI development plan of our country,autonomous unmanned system thus has received great attention from both companies and universities.As a result,to implement the scene understanding required by unmanned system,semantic segmentation that aims to assign each pixel with a class label(e.g.,car,traffic light)becomes a research hotspot.In recent years,the deep learning based methods have greatly improved the accuracy of semantic segmentation.However,benefited from the rich computation resource of experiment platforms,most highquality semantic segmentation methods did not consider the efficiency of their models,making them can hardly be applied to practical applications where the computation platform mainly built upon embedded systems.To solve this problem,this article will focus on developing realtime semantic segmentation models for the autonomous driving.The main contributions and innovations of this article are summarized as follows.We analysis deeply the source of computational cost of deep networks and find that the computational cost of a convolutional neural network(CNN)largely comes from its width rather than the depth.Based on this observation,we believe that it is better to use a relatively narrow network for real-time semantic segmentation.To ensure the capacity of network,we further suppose that the depth should be large enough when the network is narrow.Moreover,in view of the huge computational cost of the traditional convolution,we develop a novel lightweight convolution block using depth-wise separable convolution.Then,we construct a realtime feature extraction network called Narrow while Deep Network45(NDNet45)for semantic segmentation using the proposed light-weight convolution block.With comparable segmentation performance,the parameters and floating-point operations(FLOPs)of NDNet45 are 18 times and 36 times fewer than the well-known Res Net18.We present a model-and data-driven strategy for real-time semantic segmentation.From the model perspective,we adapt our NDNet45 to a modified fully convolutional network8(FCN8)structure with learned score fusion,which can improve the performance of FCN8 while introducing only minor extra computational cost.On the other hand,we find that existing realtime semantic segmentation methods cannot yet produce satisfactory results on small objects such as traffic light,which are imperative to safe autonomous driving.Thus,from the data perspective,inspired by the fact that many street images do not have small objects,we propose to augment the training data by inserting additional small objects in the original images.By applying the modified FCN8 and small object augmentation to the Cityscapes dataset,we achieve an accuracy of 65.7% mean intersection over union(m Io U)on the test set.More importantly,our model requires only 8.4G FLOPs on 1024 × 2048 high resolution images,which outperforms most existing real-time methods.Deep learning based semantic segmentation usually involves an up-sampling process that aims to recover low-resolution predictions to input resolution.Thus,although it is useful to improve the efficiency of semantic segmentation by down-sampling the inputs,it will also increase the up-sampling rate of final layer and thus present more challenges to the recovery of spatial details.To solve this contradiction,this article presents a learned up-sampling algorithm by formulating the bilinear interpolation as multiple depth-wise convolutions.Experiments show that our learned up-sampling,with parameters less than 200,can improve the segmentation performance of Deep Lab V2 by 1.2 percentage points.Although existing real-time methods has greatly improved the efficiency of semantic segmentation,their segmentation accuracy are still largely behind the high-quality methods.On the other hand,the conditional random field(CRF)has been widely used as a postprocessing step to optimize the poor-performed semantic segmentation.However,the CRF is not suited for real-time semantic segmentation due to its highly complex inference process.To this end,we revisit the idea of CRF and simplify it as ‘‘the label of each pixel should be not only dependent on features of itself but also the features of its neighbors' '.Based on this,we propose the concept of Locally Shared Features(LSF)to enable pixels to share features with their neighbors.The experiments show that the LSF can improve the performance of real-time semantic segmentation while almost not introducing additional computational cost.In summary,to solve the inefficiency problem of current high-quality semantic segmentation methods,this article develops real-time feature extraction network and real-time architecture for semantic segmentation.Then,learned up-sampling,LSF and data augmentation based on synthetic data,each of which introduces only minor or introduces not additional computational cost,are proposed to ensure the segmentation accuracy.
Keywords/Search Tags:Autonomous Unmanned System, Deep Learning, Semantic Segmentation, Traffic Scene, Conditional Random Fields
PDF Full Text Request
Related items