Font Size: a A A

Research On Dual-modal Semantic Segmentation Method Based On Visible Light And Infrared Image

Posted on:2022-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:L TianFull Text:PDF
GTID:2568307070452854Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Image semantic segmentation is one of the research hot topic in the field of computer vision.It assigns a semantic label to each pixel of an image,which can be regarded as image classification being applied to a single pixel.In recent years,with the rapid development of artificial intelligence technology,the image semantic segmentation method based on deep learning has gradually become a reliable and efficient method of scene analysis and has a wide range of application prospects in fields such as autonomous driving et.al.To handle complex road scenes,many scholars have proposed image semantic segmentation methods based on different sensors,among which the methods based on visible light images have received the most extensive research.However,the visible-light sensor will loss target information due to the obstruction of the natural light propagation process in extreme scenes(such as night,rainy and foggy days,etc.)and complex scenes(such as mutual occlusion between objects,etc.),which will affect the semantic segmentation effect of the algorithm.In recent years,research on semantic segmentation algorithms based on RGBT(RGBThermal infrared images)has gradually attracted attention,because infrared images are based on thermal radiation imaging and have the characteristics of being less affected by light,and can achieve information complementary to visible light images.The current mainstream RGBT methods-mainly use the typical parallel double-branch coding structure as the framework,and explores different fusion strategies from the available modalities.However,the existing dualmodal fusion strategies are mostly limited to simple weighting at the data feature level,and the model segmentation effect is not ideal when faced with challenges such as complex road scenes and modal degradation or even loss.Based on the above problems,this thesis conducts an in-depth study on the RGBT semantic segmentation problem.The main research contents are as follows:(1)This thesis proposed a novel bimodal semantic segmentation model.Based on visible light and infrared thermal images,this model can adapt to complex environments by fusing bimodal information at the pixel and feature levels.This model adds the pixel-level fusion module of infrared and visible light,and as an independent branch network,it performs featurelevel fusion with the two existing branches of visible light and thermal,and comprehensively considers the pixel-level and feature-level of the two modalities fusion.This fusion branch applies both spatial and channel attention mechanisms to better mine the complementary features of the two modalities at the pixel level.Finally,the representation ability of the fusion feature is enhanced,and the semantic segmentation performance is greatly improved.This model averages 62.2%and 74.8%of the indicators on MF and FR-T data sets,which are 6.5%and 0.6%higher than the state-of-the-art method,and it still has a good performance even when the dual-mode image is degraded and failed.(2)Aiming at the problem of poor segmentation of few pixel categories,this thesis proposed a geometric structure loss function.The loss function first uses the label map information of the model to extract the geometric structure information between the categories,including the contour information of the categories and the adjacent information between the categories.Then,after the normalization operation of dynamic balance,the contour information is used as the node information of the graph,and the adjacent information is used as the connection relationship of the nodes,and the geometric structure information is converted into a directed graph structure.Finally,the difference between the predicted label and the real label directed graph is calculated to obtain the specific loss function value,which is used to optimize the parameter learning of the model.This loss can dynamically adjust the contribution of the few pixel categories to the overall loss and use context information to improve the overall segmentation performance of the model.The segmentation performance of this method on the FR-T data set is improved by 0.8%,and the universality of the method is also verified on a general semantic segmentation task over the popular dataset Cityscapes.(3)Aiming at all the models and methods proposed before,this thesis finally designs and implements a semantic segmentation system based on RGBT.The system mainly includes functional modules such as data loading,parameter setting,model training,model testing,and visualization.Especially for the difficulty of actual deployment of semantic segmentation model,this thesis proposed a pixel-level knowledge distillation method.It can compress the model to facilitate the deployment of the model,improve the speed of image segmentation,and provide a solution to the problem of missing modalities in real applications as well.
Keywords/Search Tags:RGBT Semantic Segmentation, Multi-Modal Fusion, Attention Mechanism, Geometrical structure loss, Model Compression
PDF Full Text Request
Related items