Font Size: a A A

Research On Real-time Semantic Segmentation Of City Scene Based On Convolutional Neural Network

Posted on:2023-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2532306836464454Subject:Engineering
Abstract/Summary:PDF Full Text Request
Image semantic segmentation is one of the three fundamental tasks in the field of computer vision,which plays a crucial part in areas such as geographic information processing,autonomous driving,intelligent medical and computational photography.Recently,with the generation of high quality annotation data and the development of GPU hardware,large-scale convolutional neural networks have become mainstream solutions to the semantic segmentation problem.However,the large-scale networks have complex structures and a large number of parameters,which limits their application in low-power mobile devices with limited computing and storage resources.It is of great practical significance to study lightweight real-time semantic segmentation networks to achieve a better balance in segmentation accuracy,model size and inference speed.In view of this,based on convolutional neural networks,this thesis conducted intensive research on realtime semantic segmentation methods for urban scenes.The main work is as follows:(1)Aiming at the problems that the lightweight backbone is not capable of representing global information,a fast convolutional attention network which adopts the encoder-decoder architecture is proposed.In the encoder,short-term dense concatenate network is used as the backbone to enhance the feature representation capability.In the decoder,a fast convolutional attention module is introduced to enhance the output feature map of each stage in the backbone,and to learn the relationship between the three dimensions of channel,width and height.The feature maps are up-sampled by cascade architecture to restore the resolution.Experiments on Cityscapes dataset show that the fast convolutional attention network achieves a 71.9% mean intersection over union with a speed of 112 frames per second on a single RTX 2080 Ti GPU.This method achieves a good balance between accuracy and speed.(2)In order to solve the problem of performance degradation and large number of parameters of Mobile Netv2 module in lightweight backbone,a dilated Mobile Net block is introduced.It enhances the feature representation capability while keeping a low number of parameters by an extra depth-wise dilated convolutional layer.Experiments on Cityscapes dataset show that the dilated Mobile Net block has better segmentation accuracy,smaller model size and faster inference speed than Mobile Netv2 module in the lightweight backbone.(3)The linear combination operation performed in lightweight networks do not consider the relationship between fused features,resulting in limited segmentation accuracy.To solve this dilemma,a lightweight network with convolutional attention feature fusion based on encoder-decoder architecture for real-time semantic segmentation is proposed.In the encoder,the dilated Mobile Net block is used as the basic block in backbone.In the decoder,a convolutional attention feature fusion module is given.Relative attention weights that contain interactions between channel,height and width are used to aggregate feature maps,which improves the feature fusion results in lightweight networks.Specifically,the lightweight network with convolutional attention feature fusion has only 0.68 million parameters and achieves a 72.7% mean intersection over union on the Cityscapes dataset with a speed of 86 frames per second and a 67.9% mean intersection over union on the Camvid dataset with a speed of 105 frames per second on a single 2080 Ti GPU,which demonstrate that this network achieves favorable trade-off between segmentation accuracy,model size and inference speed.
Keywords/Search Tags:convolutional neural network, urban scene, real-time semantic segmentation, attention mechanism, feature fusion
PDF Full Text Request
Related items