Font Size: a A A

Surveillance Key-frame Extraction With Spatio-temporal Graph Representation

Posted on:2019-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q ZhangFull Text:PDF
GTID:2348330542997628Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of digital multimedia and video monitoring technology,surveillance cameras almost appear in everywhere,followed by the explosive growth of video.Browsing and storing these data has brought many difficulties.How to effectively use the video data and simple description of the content of the video has become one of the hot topics in the community of computer vision.Video key-frame extraction is a high concentration technology to represent the content of the original video with a small number of video frames.It provides an available solution for browsing information efficiently,less storage space and large amounts of video information retrieval.It can also be applied for various tasks,such as video retrieval,video compression and video copy detection.Most of surveillance videos are recorded by fixed camera monitoring equipment,and thus contain many useless frames which result in a lot of redundant information.Seldom of existing methods are for surveillance videos,and directly applying them to surveillance videos is unreasonable.Considering the characteristic of surveillance videos,this dissertation studies the problem of key-frame extraction.The major research contents are as follows.(1)A key-frame extraction method based on the spatio-temporal graph representation and visual attention is proposed.Compared to the traditional key-frame extraction methods with the low-level characteristics,our method uses both low-level and high-level features to represent the video frame objects.Taking all objects of a video as nodes,we represent the whole video as a spatio-temporal graph,in which the graph edges are to describe the relationship between two objects and computed by low-level and high-level features.Then,based on the constructed graph,we define a new inter-frame similarity measure to compute a correlation between two video frames.Through applying the normalized cut algorithm,the video is segmented into a set of video segments,which contain independent visual video part of one event.Finally,a new key-frame selection function is defined,which can extract the most attentive video frame from each video chunk as key-frame.To be specific,this function mainly combines the visual attention mechanism to judge whether the current video frame is key-frame or not.The constraints of these items are combined together(the content completeness,object distribution closeness and uniformity)to define the attentive key-frame selection function,which used to be selected more representative,completeness and conform to the human eye visual attention video frame as key-frames from original video.(2)To handle feature redundancy and noise,we propose a novel spatio-tempoal graph learning algorithm for key-frame extraction.The thesis pursue a graph learning algorithm to adaptively compute the affinity between two video objects while suppressing the redundancy and noise of input features.The spatio-temporal graph model to study the relationship between each two objects is better than directly on the original object features,which are calculated by the Euclidean distance.The relationship between any two video frames is then described according to the affinity matrix.Thus original video is divided into several visual consistent video chunks and then the key-frame is extracted from each video chunk.Experimental results suggest that our approach is able to suppress feature redundancy and noise and thus achieves a better performance than other baseline methods.
Keywords/Search Tags:Frame Similarity, Spatio-Temporal Graph, Normalized Cut, Visual Attention, Key-Frame Selection
PDF Full Text Request
Related items