| Semantic segmentation is a hot topic in the field of computer vision research,playing an important role in fields such as autonomous driving,medical images,and remote sensing images.Semantic segmentation technology has developed from the original traditional image segmentation to the current end-to-end semantic segmentation model based on deep learning,breaking the bottleneck in segmentation accuracy.However,for scenes that require real-time semantic segmentation,it is not enough to only improve segmentation accuracy.While improving accuracy,it is also necessary to ensure segmentation speed to meet lightweight real-time semantic segmentation.Real-time semantic segmentation is a critical step for various real-world application scenarious suah as autonomous driving systems.How to achieve a high accuracy while keeping a high inference speed has become a challenging issue for real-time semantic segmentation.Therefore,we propose a hybrid dilated grouping network for real-time semantic segmentation,which not only improves the accuracy of image segmentation,but also considers the inference speed.The main research contents and work of this thesis are as follows:(1)In order to build a lightweight real-time semantic segmentation model,this thesis designs a hybrid dilated grouping module.To reduce model parameters and speed up inference,this thesis proposes to use factorization convolution and depth-wise separable convolution instead of ordinary two-dimensional convolution.However,simply reducing the amount of model parameters may lead to a decline in segmentation performance.Therefore,we further introduce dilated convolution to extract multi-scale spatial information.The hybrid dilated grouping module can be seen as using factorization convolution and dilated convolution in the depth direction,which not only reduces the amount of model parameters,improves the inference speed,but also can extract local and more context information.(2)Considering that lightweight networks are difficult to extract deep features like large and complex networks,and dilated convolutions with different dilation rates are used in the hybrid dilated grouping module to capture multi-scale spatial information,in order to extract more channel information and further improve the feature representation ability of the network,we introduce a lightweight channel attention module to capture the information correlation between channels,which can effectively improve the segmentation accuracy of the network without significantly increasing the amount of model parameters.(3)This thesis designs a real-time semantic segmentation network based on the hybrid dilated grouping module and the attention module.The entire network is divided into an encoder and decoder structure.The encoder consists of the hybrid dilated grouping module,the attention module,and the downsampling module to extract the fine features of the image.The decoder only uses simple upsampling to restore image resolution.After obtaining shallow features and deep high-level semantic information,we use skip-layer connections to fuse feature branches at different stages to improve segmentation accuracy.In order to verify the effectiveness of our model,we conducted experiments on two common datasets,Cityscapes and Cam Vid.Firstly,we conducted ablation experiments on the Cityscapes dataset to verify the effectiveness of each module;Then,through global experimental comparison,it is shown that our proposed method achieves a balance between segmentation accuracy and inference speed. |