Font Size: a A A

Research On Several Key Issues Of Image Segmentation Based On Deep Learning

Posted on:2024-02-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:B X ZhangFull Text:PDF
GTID:1528307064975209Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Image segmentation refers to partitioning objects in an image or video frame into multiple segments based on the smallest elementary units,which is a fundamental and important task in the field of computer vision and image processing.As an important component of many vision understanding systems,image segmentation methods play a central role in a wide range of application areas,such as medical image analysis,machine vision and perception,video surveillance,etc.Early image segmentation methods were often based on classical image processing algorithms,such as the watershed algorithm,the graph-cut algorithm,the region growing algorithm,etc.In recent years,with the development of neural network algorithms,deep learning methods based on these algorithms have achieved far better results than classical methods on many tasks of the computer vision and become the mainstream for image segmentation tasks.However,faced with complex practical application scenarios,there are still many problems to be studied nowadays.First,existing 2D semantic segmentation methods have achieved real-time,but the accuracy still has a large gap compared with normal methods,so how to maintain the accuracy of semantic segmentation while pursuing speed is a major problem.Secondly,for RGB images,because working mannars of the neural network algorithm itself relies on the local texture,the current method has poor recognition ability for weak texture objects such as transparent and reflective objects,and how to achieve better segmentation effect for such objects is also a major problem at present.In addition,although the current 2D instance segmentation task relies on anchor-free single stage methods have achieved both accuracy and real-time improvement,the segmentation quality of related methods still needs to be improved.Finally,due to the increased difficulty of data acquisition and annotation,the field of3 D image segmentation,especially semantic segmentation exhibit a large lag compared with 2D ones: poor algorithm recognition accuracy due to sparse annotation data constrains the development of the field.In this paper,we mainly focus on the above four aspects to propose some new image segmentation methods,the main research content contains the following aspects.1.To address the lack of accuracy of existing real-time semantic segmentation algorithms,a real-time semantic segmentation method MFENet is proposed for simultaneous multi-scale feature enhancement from the perspective of feature enhancement.Image features can be divided into high and low levels: the low-level ones and high-level ones.The existing real-time semantic segmentation methods usually reduce the feature extraction at one or all of levels to pursue speed,which directly leads to poor segmentation results.Modules are designed in MFENet specifically to address this deficiency: a spatial edge extraction module is designed using edge extraction operators at the low level,and a context boost module is designed using channel attention at the high level.In addition,a selective refinement module is designed to fuse the features from both levels.To verify the effectiveness of MFENet,experiments are conducted on three mainstream public datasets in this paper.The results show that MFENet achieves better accuracy while maintaining the real-time performance.2.For the semantic segmentation task of textureless and weakly-textured objects,such as transparent and reflective objects,this paper proposes ShuffleTrans,a method that enhances global context information by using dynamic patch-wise weight shuffling.Since neural network algorithms work in a local perception manner,they are not good at recognizing object boundaries,and mainstream methods usually explicitly extract boundary features to enhance the recognition of such objects.ShuffleTrans takes a different approach by using an operation called Weight Shuffle and combines it with the dynamic convolution to enhance the global context information and thus enhance the boundary extraction capability of the convolution itself.Experimental results on four semantic segmentation datasets of transparent/reflective objects show that ShuffleTrans gives better accuracy than most current methods.3.To improve the segmentation quality of existing anchor-free single-stage realtime instance segmentation methods,a method DOBNet that dynamically refines the final segmentation by boundary information is proposed.Mainstream real-time instance segmentation methods usually use a "detect-then-segment" process and ignore edges of each detected object,which can easily causes wasted resources and low accuracy.To overcome these two weaknesses,DOBNet adopts a "segment only" approach,using octave convolution to dynamically generate dual-frequency weights to segment the mask and boundary of each object separately,and using the boundary to refine the mask to generate the final prediction results.Experimental results on mainstream datasets show that the DOBNet achieves high accuracy while maintaining real-time performance.4.For the problem of insufficient annotation data related to 3D semantic segmentation,Domain Adaptation(DA)is used to solve the problem,and a domain adaptation method Mx2 M based on cross-modal masked modeling is specifically proposed.Since the 3D semantic segmentation dataset mostly contains corresponding2 D data,the cross-modal domain adaptation method using two modalities has obvious advantages over the 3D unimodal method.Mainstream cross-modal methods usually perform domain adaptation only by the complementarity on the features between different modalities.However,due to the lack of labeled target data,the complementarity between both modalities is usually unreliable when the domain gap is too large,resulting in poor robustness of the method on different data.Mx2 M utilizes masked modeling for cross-modal self-supervision,resulting in inter-modal data augmentation to improve the overall network robustness and reduce the domain gap.Experiments on three domain adaptation scenarios show that Mx2 M has a significant improvement in domain adaptation compared to previous methods.
Keywords/Search Tags:Image segmentation, Semantic segmentation, Instance segmentation, 3D semantic segmentation, Domain adaptation
PDF Full Text Request
Related items