Font Size: a A A

ROI Extraction For Stereoscopic Video Based On Visual Attention

Posted on:2014-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:G YeFull Text:PDF
GTID:2248330395976056Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Nowadays, image and video has increasingly become the main form of Multimedia. How to efficiently locate the image area that viewer needed most from the large-scale image and video data has become a hot problem. Extraction of region of interest (ROI) is one of the key techniques to solve this problem. ROI is an area which can cause the viewer’s interest, and can mostly represent the image’s content. ROI extraction technology is very important and widely used in the filed of image processing and analysis, such as JPEG2000compression coding, target location and identification in machine vision, caption extraction and recognition in the video information, medical image analysis, etc.Human visual system can quickly and accurately focus on a few salient objects in image and video, this objects is called region of interest (ROI, for short), this process is called visual attention. These areas always have a big difference in brightness, texture, color, shape and motion from their surround. Numerous visual attention model have proposed, the most representative is Itti and Koch’s method. This method extracts brightness, color and orientation in image firstly, then use a mechanism called "center-surround" fuse these feature maps into saliency map finally.Three-dimensional (3D) video technologies are becoming increasingly popular in our daily life. As it can provide a high quality experience and immersive feeling compared to traditional2D display, more and more people prefer to it. Due to the introduction of depth information, the traditional2D based image ROI extraction method is not good enough to predict saliency map for stereoscopic video. In this paper, we do a in-deep study on human visual attention mechanism, using a bottom-up approach, propose a3D visual Attention Model based on traditional2D and motion features, and also the depth information.Another innovation in this paper is the features fusion based on ANN. Previous methods often get the final saliency map by a simple linear combination from multiple saliency features, thus have a large deviation with actual human’s data. "Ground truth" is from the eye-tracking data available online as well as our own experimental labeled data, as the input samples of ANN, training a more powerful prior model to predict the ROI which can be more similar to human visual system. Then we can locate the position and size of ROI from saliency map, but it is not stable. In this paper, we use kalman filter to optimize in time domain, letting the position and size of ROI more accurate and stable. The experimental results show that our proposed3D visual Attention Model has a powerful ability to predict the ROI in stereoscopic video.
Keywords/Search Tags:visual attention, saliency map, region of interest, artificial neural network
PDF Full Text Request
Related items