Font Size: a A A

Extraction Of Salient Region In Video Based On Deep Learning

Posted on:2019-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:X WeiFull Text:PDF
GTID:2428330590467425Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In complex scenarios,human can quickly identify regions of interest and understand scenes,which is based on the visual attention mechanism of human visual system.Visual information is mainly derived from the received image or video data.When we look at an image,the human eye is more likely to locate the regions that stimulate the vision,which is a salient region.The introduction of human visual attention mechanism in image processing of computer can not only filter useless data and improve computing efficiency,but also has important application value in many computer vision tasks.Extraction of salient region in video is to extract regions of interest from video frames by simulating human visual attention mechanism.In recent years deep learning network has a good performance in object detection and image classification.It originates from that deep learning can effectively distinguish between complex features and feature extracted is more suitable for the target task.However,traditional methods mainly select hand-crafted features which may not match the target task.Therefore,the proposal of deep learning will greatly promote the development of extracting salient region.According to the research of extraction of salient region in video and frontier technology,two algorithms are proposed based on deep learning in the paper.Firstly,one method is proposed based on the fusion of coarse and fine features,which achieves the learning of coarse global information by dual-stream convolutional neural network and the refinement of details by recurrent connections.The fusion process is completed through the network cascading.Secondly,conditional generated adversarial network is designed to solve the problem of lack of enough datasets for training.The loss function of generated network is summed between adversarial loss and content loss.The content loss is calculated by the cross entropy of predicted saliency map and groundtruth.This paper adopts three evaluation metrics to compare the proposed models,Presicion-Recall curves,F-measure and AUC,from two aspects of qualitative and quantitative analysis.For the method based on the fusion of coarse and fine features,the precision is increased by 10.76% after adding the recurrent connections for learning refined features.For another algorithm based on conditional generated adversarial network,the precision is increased up to 15.24% when combining the discriminative network for adversarial training.Compared with six benchmark methods,our algorithms achieve a state-of-art performance.In particular,the first method reaches 86.96% in precision and 86.72% in recall.
Keywords/Search Tags:video saliency, region extraction, convolution neural network, generative adversarial network
PDF Full Text Request
Related items