Font Size: a A A

Research On Video Saliency Detection Based On External Spatiotemporal Feature Mapping

Posted on:2021-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:G T WangFull Text:PDF
GTID:2438330611492858Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Video saliency detection aim to detect the most attractive area or object in the video sequence by simulating the human visual attention mechanism,so as to allocate limited computer hardware resources to more important areas.At present,the mainstream methods mainly adopt the traditional manual way to design the low-level information(color,edge,etc.)to obtain the video saliency results.Due to the lack of high-level semantic information guidance on the low-level details,the accuracy of detection results can not meet the actual detection accuracy requirements.Thanks to the rapid development of deep learning networks in recent years,which can automatically acquire the low and high level features,deep learning networks or semi deep learning networks(deep learning combined with traditional methods)have been developed rapidly.However,the research of using these two schemes to obtain video saliency detection is in infancy,the main reason is the lack of data sets that can be used to train deep networks,as well as the lack of spatiotemporal information effectively combination.Therefore,this paper explores two means,deep learning and semi deep learning,and proposes the following solutions:(1)a Long-term Spatial Temporal Information(LSTI)modeling scheme.Due to the absence of long-term information,both the intermittent movements induced hollow effects and the external disturbance caused false-alarm detections can frequently occur in the conventional methods using only short term spatial-temporal information.Our new method intends to reveal the LSTI from the high-quality low-level saliency estimations,which could be characterized using the newly designed fast quality assessment(FQA)scheme,by performing non-local inter-frame alignments guided by SIFT-Flow.Next,we utilize a novel deep saliency framework to take full advantage of the newly available LSTI to simultaneously learn the discriminative information towards the salient foregrounds while maintaining strong spatial-temporal saliency consistency in order to achieve highperformance video saliency detection.(2)A fast "Full Interactive" Spatial Temporal(FIST)deep learning scheme for spatiotemporal features.The current main stream methods formulate their video saliency mainly from two independent venues,i.e.,the spatial and temporal branches.As a complementary component,the main task for the temporal branch is to intermittently focus the spatial branch on those regions with salient movements.In this way,even though the overall video saliency quality is heavily dependent on its spatial branch,however,the performance of the temporal branch still matter.Thus,the key factor to improve the overall video saliency is how to further boost the performance of these branches efficiently.In this paper,we propose a novel spatiotemporal network to achieve such improvement in a full interactive fashion.We integrate a lightweight temporal model into the spatial branch to coarsely locate those spatially salient regions which are correlated with trustworthy salient movements.Meanwhile,the spatial branch itself is able to recurrently refine the temporal model in a multi-scale manner.In this way,both the spatial and temporal branches are able to interact with each other,achieving the mutual performance improvement.This paper proposed methods are easy to implement yet effective,achieving high quality video saliency detection.Through qualitative and quantitative analysis with stateof-the-art(SOTA)which shows that the proposed method has reached the leading performance.
Keywords/Search Tags:Video Saliency Detection, Spatial-temporal Feature Consistency, Longterm Spatiotemporal Information, Full Interactive Spatiotemporal Information
PDF Full Text Request
Related items