Font Size: a A A

Research On Semantic Segmentation Algorithm Based On Non-local Region Pixel Enhancement And Feature Information Association

Posted on:2024-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y J WeiFull Text:PDF
GTID:2568307058481814Subject:Engineering
Abstract/Summary:PDF Full Text Request
In computer vision,semantic segmentation is one of the key tasks.With the development of society and economy,more and more applications require obtaining relevant information and concepts through observation and analysis of images.As a core problem in computer vision,the importance of semantic segmentation is becoming increasingly prominent.Semantic segmentation technology plays an important role in the development of modern technology society and is widely used in fields such as geographic information systems,autonomous driving,medical image analysis,and robotics.It can provide richer semantic information,enabling us to identify images and provide important support for fields such as autonomous driving,robotics,and image search engines more accurately.Segmentation is the task of classifying pixels on a semantic level.Research has shifted its focus to the challenge of balancing real-time and high-precision semantic segmentation.The accuracy and speed of semantic segmentation depend significantly on the quality of the encoder and decoder.Therefore,this thesis proposes a semantic segmentation network to improve the tradeoff between speed and accuracy.The main research work of this thesis includes:(1)We propose a scene semantic segmentation network based on non-local region pixel enhancement(NRPENet).Based on ANNet network,we design two new modules:CERM(Channel Enhancement Regions Module)and PEPM(Position-enhanced Pixels Module).There are three innovations in the non-local area pixel enhancement network: First,after the fourth stage of backbone network,the number of characteristic channels in the fourth stage of ANNet is compressed into the number of data categories by convolution module,and the category probability feature map is obtained by normalization function,and the formation of category areas is supervised by cross entropy loss function.Secondly,two matrix multiplications are reduced to one matrix multiplication.We matrix-weight every channel in the category probability feature map to every channel in the backbone network output feature map.Through this operation,we not only consider the relationship between the channels of the feature map,but also enhance the regional pixel representation of the feature map.Thirdly,for PEPM(Position Enhanced Pixel Module),one input of APNB is improved to two inputs,and the two inputs of PEPM make full use of the characteristic diagram output by CERM and backbone network.PEPM avoids the problem that the resolution of feature map disappears because of multi-stage feature information extraction by establishing the pixel dependence in the spatial direction of feature map.By using the feature map output by CERM and the feature map output by backbone network,the utilization rate of key information of feature map is improved.In a word,the combination of CERM and PEPM module not only improves the accuracy of semantic segmentation network,but also effectively reduces the network parameters and computation,and improves the speed and efficiency of the network.(2)A segmentation algorithm for image semantics,based on feature information association,is split into two parts: channel feature map information association and information association between pixels in feature map space.The current convolutional neural network for feature extraction of pictures requires multiple layers of convolutional blocks to cover the entirety of the picture’s perceptual field and extract the necessary information to achieve the desired effect.However,stacking too many convolutional blocks will not only cause an increase in the number of parameters,but also lead to overfitting of the network.While multi-layer convolution is not good for extracting global information of feature maps,it has a strong ability to extract spatial detail information.Stacking too many convolutional layers will cause resolution loss and segmentation errors for objects with edges and small targets.Furthermore,objects are easily affected by illumination,leading to different classes being divided into the same class while the same class is separated into different classes.To solve this problem,this thesis proposes a feature weighted selection module,namely FWS.It can weigh the input feature map through the generated weight vector,give small weight to those objects that are easy to segment,and give high weight to those objects that are difficult to segment,balancing the accuracy of each category.After the feature graph of the FWS module,there is the feature information correlation between its channel feature graphs.Each channel feature map can make an accurate judgment based on other feature map context information.In addition,the pixel correlation of feature maps is studied from a micro aspect.While Transformer mainly consists of multilayer attention,which has a strong ability to extract global information,it is insensitive to spatial detail information.In this thesis,we combine the advantages of both Transformer and convolutional block to achieve high precision semantic segmentation without increasing model parameters.Compared with other classical algorithms in terms of accuracy and real-time performance,our method demonstrates superiority.
Keywords/Search Tags:Matrix multiplication, Regional pixel representation, Cross entropy loss function, Feature information association, Combining Convolution and Transformer
PDF Full Text Request
Related items