Font Size: a A A

Research On Efficient Real-Time Semantic Segmentation Methods For Autonomous Driving Scenes

Posted on:2021-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2392330614463885Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Image Semantic Segmentation is one of the fundamental and challenging task in the fielf of computer vision,which is designed to predict the category label of each pixel in an image,it is playing an increasingly important role in a variety of vision tasks,such as driver assistance,indoor and outdoor scenes parsing,and 3D scene modeling.Recently,Deep Convolutional Neural Networks(DCNNs)have become a mainstream method to solve the image semantic segmentation problem,these DCNNs traditionally using a large number of labeled data training network to get the best-fit model.Existing methods constructing complex networks through stacking a large number of convolution layers,despite they have achieved a significant performance boosts,but they face serious memory consumption and latency issues,making them unsuitable for real-time application scenarios,such as autonomous driving,augmented reality,and internet of things to name of few.Therefore,based on the DCNNs,this thesis has conducted intensive research on efficient semantics segmentation methods for autonomous driving scenes with both accuracy and speed,the specific contents are as follows:Firstly,from the perspective of the lightweight model design,this thesis proposes a lightweight encoder-decoder segmentation method for real-time semantic segmentation.This method solves the real-time semantic segmentation task through constructing an asymmetric encoder-decoder network.The encoder proposes a novel residual encoding module based on factorized convolution,and an attention pyramid module is proposed in the decoder-end to extract dense features.The experimental results show that compared with the most advanced DCNNs,this method uses few parameters,achieves the forward inference speed over 71 frame per second(FPS)and the segmentation accuracy of 70.6% mean-Intersection-over-Union(m Io U).This method achieves a balance between segmentation accuracy and efficiency,and become an efficient method for the image segmentation task.Secondly,by analyzing the characteristics of the dense structural prediction task of image semantic segmentation,this thesis proposes an efficient symmetrical segmentation model for the realtime semantic segmentation task.This method solves the real-time image semantic segmentation task by constructing a symmetrical encoder-decoder network.The entire network structure,is mainly by symmetrically stacking the proposed factorized convolution unit(FCU)and parallel factorized convolution unit(PFCU)to achieve fast model forward inference speed,and by using the proposed hybrid-dilated convolution module to expand the receptive field of network to extract deep semantic features,which enhances the network's ability to express features.The experimental results show that,the proposed overall network architecture capable of operating in excess of 60 FPS speed on a single GTX 1080 Ti GPU,achieves the segmentation accuracy of 70.7% m Io U,and the model size is only 1.6 M,which becomes a feasible method to achieve efficient image semantic segmentation under the condition of limited resources.Thirdly,from the point of view of context information modeling and utilizing the attention mechanism inspired by the human visual system,this thesis proposes an efficient real-time semantic segmentation method guided by the attention mechanism.This method utilizes an improved pyramid attention module based on the factorized convolution to extracting dense context information.Meanwhile,this method utilizes the characteristic of low-level features contains rich spatial information to explicitly model the semantic relationships between spatial pixels based on the spatial attention mechanism to guide high-level feature maps to be upsample to recover spatial information.The results of a large number of comparison and ablation experiments show that the method can run at a faster forward inference speed and have high segmentation accuracy as well.This thesis validates the method on two urban scene benchmark datasets.On the benchmark dataset Cityscapes,the method can achieve a forward inference speed of more than 50 FPS,while achieving a segmentation accuracy of 71.3% m Io U.On the benchmark dataset Cam Vid,the method can perform forward inference speed over 90 FPS and achieves a segmentation accuracy of 69.4% m Io U.The experimental results show that the proposed network structure can be used for efficient image semantic segmentation and can also be applied to more complex scene understanding tasks.In conclusion,from the point of view of lightweight model design,the utilization of dilated convolution module,the contextual information modeling and the utilization of attention mechanisms,this thesis conducts intensive research on efficient image semantic segmentation methods based on DCNNs,and further proposes some efficient real-time semantic segmentation methods for autonomous driving scenes.The experimental results show that the proposed methods achieves competitive segmentation accuracy,while being able to effectively improve the segmentation efficiency,and has the ability to be applied in real scenarios.
Keywords/Search Tags:Image Semantic Segmentation, Deep Convolutional Neural Networks, Autonomous Driving Scenes, Real-time Semantic Segmentation, Lightweight Model, Dilated Convolution, Attention Mechanism
PDF Full Text Request
Related items