Font Size: a A A

Research On Salient Object Detection Method Based On Deep Feature Learning And Guidanc

Posted on:2024-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:X C WangFull Text:PDF
GTID:2568307106975979Subject:Electronic information
Abstract/Summary:PDF Full Text Request
The purpose of salient object detection is to identify the most attractive salient regions in the images or videos,so as to extract useful information for subsequent tasks from the complex scenes.As a key technology in the field of computer vision,salient object detection is widely employed in intelligent security,aerospace,human-computer interaction and other fields.At present,salient object detection method are mostly based on the deep learning and make good progresses.However,there are still some problems,such as less consideration of edge details of salient objects and insufficient learning of spatial temporal features,and so on.Working on the two specific tasks of image salient object detection and video salient object detection,we attempt to resolve the above issues.In this paper,the following works have been completed:(1)Considering the fully supervised image salient object detection depends on large scale pixel-level annotation,however,they are costly to produce and difficult to obtain.The first work of this paper is to study the weakly supervised image salient object detection under the image-level category labels.Aiming at the problem that the current weakly supervised learning methods pay less attention to the boundary of salient objects,and cannot guarantee the details.This work proposes an edge depth mining network.First,by constructing two modules,coarse edge generating and fine edge generating,the fine edge feature maps of the object are generated from coarse to fine,and from simple to accurate.Later,the pseudo label generating module is designed,and uses fine edge feature maps to generate the initial pseudo labels.It uses a fusing and denoising mechanism to optimize the initial pseudo labels,and enhances their foreground features and suppresses background noises.Finally,the model uses pseudo labels to conduct saliency training.A series of experiments are conducted on four public datasets demonstrate that the model is both effective and superior.(2)In the second work of this paper,we study the video salient object detection.In the video sequences,aiming at the problem that the current methods do not take into account the spatial and temporal information of different depths extracted by convolutional neural network adequately.We propose a spatial temporal aggregation guidance network.For spatial information,the spatial feature aggregation module is designed based on the encoding and decoding structure to learn the spatial information of the frames in sequences.A global feature aggregation strategy is designed to improve the ability to extract features.For the temporal information,considering that the convolutional neural network extract better details of shallow features and stronger semantics of deep features.A multiple-level temporal feature mining module is designed,which is composed of multiple independent and parallel components to learn them simultaneously.It can enhance the ability to depict and identify temporal information.Aiming at different convolutional depths,first,it learns the temporal information initially,and an inter frame feature guidance strategy is designed to further extract its temporal correlation.Relevant experiments on four public datasets show that the model has good detection performance.(3)In the third work of this paper,aiming at the problem that current video salient object detection has a slight lack of continuous learning ability for temporal information,and a spatial temporal progressive learning network is proposed.In the spatial domain,we design a static feature mining module,and it uses a U-shaped structure to encode and decode video frames.In the temporal domain,this work considers both the subject parts and the deformation regions of motion objects,we design a motion feature progressive learning module.It can capture the temporal correlation between two frames with a strategy of guidance learning by inter frame features.This module is composed of two parts,they are motion subject feature extractor and refinement feature extractor.Among them,the former uses the previous frames to enhance the subject part features of the current frame.In the latter,an inter frame attention mechanism is designed to learns the position information and channel information of motion objects.It captures the temporal correlation and motion tendency of it,so that the module can predict the deformation regions of the current frame.In addition,the latter further optimizes the features of objects.On the four public datasets,several experimental results show that this model has achieved good results in video salient object detection.
Keywords/Search Tags:salient object detection, weekly supervised learning, edge depth mining, spatial temporal information
PDF Full Text Request
Related items