Font Size: a A A

Video Saliency Detection Guided By Motion Information

Posted on:2022-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:J SongFull Text:PDF
GTID:2518306566990909Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Computer vision processing technology has developed rapidly in recent years and has gradually played an important role in people's daily work and life.However,the development speed of computer processing video capabilities cannot match the rapid increase of video data,which directly leads to the inability of using computers to filter and process really useful video data.Therefore,pre-processing video content has become an urgent need.Video salient object detection technology is the best solution to achieve the preprocessing of useful video content.The main purpose of video saliency detection is to segment the region of interest of the human eye from the background.At present,CNN-based methods mainly use dual-stream networks to fuse spatio-temporal information,But the performance of the model is limited by the temporal branch.When the object is stationary or the background is shaking,it is difficult to directly obtain reliable motion information using optical flow networks alone.In addition,due to the lack of large data sets for training complex networks,the inference results are easy to overfit.Finally,the model can not achieve optimal performance.Therefore,this article proposes two solutions to the above problems:1.We propose a video saliency detection algorithm based on motion quality perception.This algorithm first proposes the concept of “motion quality” and uses a convolutional neural network to evaluate the motion quality.We apply this motion quality label to shield the negative impact of low-quality motion information on the final saliency result..At the same time,in order to save the cost of labeling,a semi supervision method is used to supervise the network training.In addition,we also propose a general learning scheme to improve the performance of any advanced model.2.We propose a novel model,named Multi-Stream Network Consistency(MSNC),which is a typical three-stream network,where we resort to three different sub branches to extract spatial,temporal,and prior deep features respectively.The key technical innovation is the newly proposed prior stream,which is capable of improving the motion saliency quality estimated in the temporal sub branch.After obtaining three types of deep features,a novel fusion scheme using the multi-stream consistency as the fusion indicator will be used to fuse all these deep features as the final video saliency detection result.Specifically,we have advised a cyclic training strategy to avoid overfitting for each sub branch.We have carried out extensive quantitative experiments on five publicly available datasets,where the results have shown the superiority against the state-of-the-art(SOTA)models in terms of three metrics.Compared with SOTA models,the proposed algorithms are more robust and effective in performing video saliency detection.Taking motion information as a starting point,it can shield the background noise of the appearance branch,whether it is to evaluate the quality of motion or make up for the lack of optical flow,it can maximize the use of motion information to better locate salient objects.
Keywords/Search Tags:Deep Convolutional Neural Network, Video Saliency Detection, Motion Quality, Prior Information, Multi-Stream Consistency Fusion
PDF Full Text Request
Related items