Font Size: a A A

Research Of Salient Object Detection Based On Deep Optimal Spatio-temporal Consistence

Posted on:2021-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y N LiFull Text:PDF
GTID:2518306560953049Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Video salient object detection aims to use a computer to quickly and efficiently obtain the region of interest from the video,which is helpful for subsequent target extraction or localization processing.Because of camera jitter,illumination variations,abrupt shot change,and camera movements,it is easy to cause problems such as the distortion or blur and irrelevant background information interfered.All of these factors cause salient object detection a huge challenge.There are two kinds of methods in the field of video salient object detection.One is the manual feature-based methods and another is the depth learning-based methods.Compared with the methods based on deep learning,the unsupervised methods based on manual features get lower accuracy score.However,most of the methods based on deep learning are supervised methods that require large amounts of data annotation.Although there are some methods combined with manual features and deep learning,the problem of low detection accuracy in complex scenes has not been solved.In view of the above problems,this thesis proposes a model of salient object detection based on deep optimal spatio-temporal consistence.The deep learning model combined with spatio-temporal consistence can enhances detection accuracy in complex scenes.The main work and innovations of this thesis are as follows :In this thesis,research of spatio-temporal salient object detection based on manual features is studied.A method of salient object detection based on spatio-temporal consistence is purposed to suppresses background noise and extracts common salient objects.This method integrates three modules of the spatio-temporal domain,spatial domain and temporal domain.In the spatio-temporal domain,global motion information is exploited to highlight motion regions.In order to control the interference of the moving background,the algorithm of background weighting and the algorithm of adaptively fusion with the spatio-temporal gradient flow field map are applied for obtaining saliency regions in spatio-temporal domain(spatio-temporal saliency map).In the spatial domain,local contrast and global contrast are adopt to distinguish salient regions and background from features(spatial saliency map).In this way,objects edge in spatial saliency maps can get better.In addition,the same salient object in temporal domain is obtained by comparing the similarity of color and motion features between adjacent frames(temporal saliency map).Finally,the saliency maps produced by the three modules are nonlinearly integrated into a video saliency map to enhance spatio-temporal consistence.For the problems of object edge blur and lack of detail in the model based on manual features,a dual-channel saliency object refinement network is proposed to improve the integrity and accuracy of the salient object further.The network is designed for two parallel channels,initial feature channel and optimized feature channel.Neural Network Embedding(NNE)is used to obtain features of pixels and foreground/background from the original image and the video saliency map.All features are mapped to the same metric space to match the nearest neighbors.VGG-16 is the feature extractor for perceiving local information of the image in initial feature channel.A Feature Optimization Module(FOM)is build up on initial feature channel,which promotes the ability of abstract description of features.The whole structure forms optimized feature channel for narrowing the gap after feature mapping between similar classes.In order to obtain a better classification effect,the loss function takes all of the results of the dual-channel pixel classification,which prompts the network to capture objects with clear edges and fine details accurately and effectively.In this thesis,qualitative and quantitative evaluation experiments are performed on the DAVIS,Segtrackv2,and Vi Sal datasets.The results show that the model based on deep optimal spatio-temporal consistence improves the accuracy of the model based on spatiotemporal consistence further.Compared with the state-of-the-art video salient object detection algorithms,the proposed algorithm is more robust for detecting salient object in complex scenes.
Keywords/Search Tags:video saliency, salient object detection, contrast, boundary connectivity, spatio-temporal consistence
PDF Full Text Request
Related items