Font Size: a A A

Research On Spatiotemporal Weighted Dissimilarity-based Methed For Video Saliency Detection

Posted on:2015-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:T XiFull Text:PDF
GTID:2298330452453261Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
According to the knowledge of physiology and psychology, human visual systemtends to firstly look at the signal area where can stimulate the nervous system moststrongly, the signal area corresponds with the interesting objects in image and movingobjects in video. By generating the saliency map of the image or video automatically,visual attention models simulate this kind of behavior of the human visual system.Visual attention model can be widely used in multimedia information description,object detection and classification, behavior analysis, transmission control of multi-media information network and other fields. In this paper, the specific research workas follows:First of all, this paper reviews the physiological mechanism of visual attention,and then reviews the bottom-up saliency models and the top-down saliency modelsrespectively, the relationship and distinction between image saliency and videosaliency as well as research status of video saliency models, we also introduced someapplications using visual attention models, including image compression, robotcontrol, etc. Based on the physiological and psychological mechanism of visualattention in human visual system, we obtain four factors influencing the human eyegaze when you watch the video, including appearence differences, spatial locationdifference, the difference of image block priority that is central bias and motionfeature of the image block.Secondly, this paper proposes a video saliency model based on temporal andspatial information, this model is established on the basis of the above four keyfactors. Among them, the former three factors space are used to generate a spatialsaliency map. In general, people pay more attention to moving objects whose speedare different of other objects, so we can build a motion perception model according tothe mechanism of human physiology and psychology, which can generate a saliencymap. According to the spatial saliency map and temporal saliency map, we establish aspatial-temporal saliency fusion model by weighted sum method. Based on themultiple video clips on the datasets, the result shows that compared with the existinginternational attention models, the proposed model is more consistent with real humanviewpoint than other existing models.Finally, according to the framework of above saliency model, this paper putsforward the a video saliency model based on sparse representation, which is in linewith the physiological process of signal coding in human eyes. For the large amount of video data, this paper adopts the technique of video segmentation and key frameextraction technology to train the dictionary, so that reducing the training time of thedictionary. By the comparison with other methods, the video saliency model based onsparse representation shows higher accuracy of human eye gaze prediction.
Keywords/Search Tags:visual attention, video saliency, spatial distance, central bias, sparse coding
PDF Full Text Request
Related items