Font Size: a A A

Research On Image Segmentation Based On Multi-task Learning Deep Neural Networks

Posted on:2022-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:C C WangFull Text:PDF
GTID:2518306563977699Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Image segmentation tasks including image semantic segmentation and image instance segmentation are important topics in computer vision field.Traditional image segmentation methods often need to manually extract features first,and then perform segmentation.The feature representation ability is not strong enough,so the traditional methods' efficiency and accuracy are low.With the rapid development of deep learning and its wide application in the field of computer vision,image segmentation methods based on deep neural networks have emerged.End-to-end training methods and largescale learnable parameters make image segmentation more efficient and accurate than traditional methods.However,the complexity of the network structure and the large number of parameters bring a large amount of calculations,and the improvement of the real-time performance of image segmentation is still a huge challenge.In addition,although instance segmentation and semantic segmentation are two different image segmentation tasks,they have similarities.How to use instance segmentation to assist in improving the effect of semantic segmentation is also worth exploring.This thesis aims to realize image semantic segmentation based on multi-task deep neural network,and build a semantic segmentation network from the perspective of real-time,multi-task learning and structural improvement.The main work of this thesis is as follows:(1)Build a simple semantic segmentation head on the backbone of feature extraction for real-time semantic segmentation.This thesis uses improved ResNet-FPN as the feature extraction backbone,its bottom-up-top-down structure has strong feature extraction capabilities.Therefore,this thesis builds a simple semantic segmentation head on the backbone of ResNet-FPN.The semantic segmentation head combines FPN multilayer and multi-scale features,and each layer of features only undergoes a few simple layers of convolution operations.Compared with the classic semantic segmentation method FCN,this structure has increased the speed by 25 FPS and reached 34.6 FPS.The mIoU on the PASCAL VOC 2012 val set is 79.25%,which is an increase of 13.74%compared to FCN.(2)Build a multi-task learning structure for the purpose of auxiliary semantic segmentation using propotype.This thesis builds a semantic segmentation head on ResNet-FPN,the backbone of the real-time instance segmentation network YOLACT.Since the prototype features in YOLACT can roughly reflect the location of each instance,this thesis combines the prototype features with the semantic segmentation features to assist semantic segmentation and improve semantic segmentation effect.The proposed method has an mIoU of 83.35% on the PASCAL VOC 2012 val set at a small speed cost,which is 2.37% higher than the mIoU when task assistance is not used.(3)Use deformable convolution to improve the semantic segmentation head structure for the purpose of further improving the segmentation performance.Compared with standard convolution,deformable convolution is more adaptable to object deformation,so it is often used in instance segmentation to improve instance segmentation performance.However,none of existing real-time semantic segmentation methods apply deformable convolution to the process of obtaining the semantic segmentation score.Considering the advantages of deformable convolution,this thesis innovatively applies it to the real-time semantic segmentation score generation process,replacing standard convolution with deformable convolution,and the segmentation performance is greatly improved.Although deformable convolution brings a speed reduction after the increase in the number of parameters,this improvement still achieves an mIoU of 85.69% on the PASCAL VOC 2012 val set,and achieves real-time semantic segmentation with an mIoU of 83.15%.Using methods proposed in this thesis together can achieve an mIoU of 23.63% on the PASCAL VOC 2012 val set wich is 23.63% higher than FCN and a speed improvement of 23 FPS.
Keywords/Search Tags:Multi-task learning, Deep neural network, Real-time semantic segmentation, Feature fusion, Deformable convolution
PDF Full Text Request
Related items