Font Size: a A A

Research On Semantic Segmentation Based On Convolutional Neural Network

Posted on:2022-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:F H ZhaiFull Text:PDF
GTID:2518306527983129Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a basic computer vision processing task,image semantic segmentation essentially makes use of pixel classification to divide the image into several different and meaningful regions.From a macro point of view,semantic segmentation is a necessary step for scene understanding and a process of changing an image from concrete to abstract.semantic segmentation has a very important application value in the fields of autopilot,medical diagnosis,remote sensing image analysis and so on.In recent years,with the prevalence of deep learning,semantic segmentation methods based on fully convolutional neural networks have attracted extensive attention as a relatively new research direction.With the joint efforts of countless researchers,the semantic segmentation technology based on deep learning is obviously mature,but there are still some problems to be optimized.First,in order to obtain a larger receptive field,the resolution of the features will decrease,which will lead to the loss of detailed information.Second,the deep-level feature extraction network due to the stacking of convolutions makes the feature representation of pixels around the edge inconspicuous.Third,the feature information in different network layers can help optimize the final segmentation result,but simple fusion features may not bring the desired effect.In view of these problems,this paper proposes corresponding improvement strategies by studying the existing models.The main research contents and achievements of this paper are as follows:(1)In order to solving the problem of the loss of detailed information caused by downsampling in deep networks,a dual-stream semantic segmentation algorithm with edge feature optimization based on Deep Lab V3+ is proposed in this paper.This model takes traditional RGB images and first-order gradient images as input,and processes semantic information and edge information simultaneously through parallel network structure.In the process of feature extraction,a new feature fusion module is designed to realize the interaction between the two streams,which helps the edge stream with a simple structure quickly filter out the useless noise information in the first-order gradient image.In the decoding stage,to obtain more precise segmentation results,the algorithm uses the learned edge information to help the semantic stream recover the lost spatial edge information through feature fusion,(2)In order to solving the problem of feature smoothing caused by the operation of convolution,a dual-stream network with sparse attention model for semantic segmentation is on the third chapter.The new network is improved from the dual-stream semantic segmentation network.Mainly,it is embedded the newly designed sparse attention module in the decoding phase to help the network optimize the feature representation of some vectors and improve the performance of network segmentation.Although the classic self-attention model can effectively strengthen the feature representation of semantic features through dense similarity modeling,it brings huge resource consumption problems that are unacceptable for some researchers who are short of computing resources.Under the premise of ensuring performance,the new algorithm sparses the two key matrices: 0) and 0) which are designed in the selfattention model,and then improves the problem of computing resource consumption through sparse similarity modeling.In addition,to ensure that sparse attention can capture dense context features,inspired by the K-means algorithm,this paper proposes a class-attention model to optimize the distance between the vector and the classification center,and embeds it into the sparse attention module.(3)In order to solve the problem of detailed information loss and feature fusion in deep network,on the basis of existing network,a multi-scale feature fusion network with edge feature learning for semantic segmentation is proposed in this paper.Due to high resolution and small receptive field,low-level feature contains more useful location and detail information,but its ability of abstracting semantics is poor.On the contrary,due to the large perception field,highlevel feature contains abundant semantic information,but its resolution is low and its ability of perceiving details is poor.This method,which is also based on the classic semantic segmentation network Deep Lab V3+,integrates useful semantic information and spatial edge information in different network layers through a multi-scale feature fusion module from shallow to deep and then from deep to shallow.In the decoding stage,a newly designed multitask decoding network is supplemented to help the multi-scale feature fusion module to filter out useless noise information more accurately through pre-output supervised learning of edge features and semantic features.In the final segmentation stage,to improve the overall performance of the network,this method further refines the segmentation results and by combining the learned semantic features and edge features.Finally,the excellent performance of the proposed algorithms is demonstrated by detailed experiments on the public datasets.
Keywords/Search Tags:Image semantic segmentation, Deep learning, Edge optimization, Dual-stream, Sparse attention, Multi-scale feature fusion
PDF Full Text Request
Related items