Font Size: a A A

Video Summarization Techniques Based On User Interest And Content Importance Learning

Posted on:2020-03-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:M J FeiFull Text:PDF
GTID:1368330572982979Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The development of video surveillance and mobile Internet has created an explosive growth in the amount of user-captured videos.Facing with massive videos,traditional video access and storage present significant limitations for the emerging new multimedia services such as content-based video searching,browsing and video surveillance.On the one hand,users will spend a lot of time on watching original videos and cannot quickly capture desired content.On the other hand,it makes data storage on video website more difficult.In this case,video summarization techniques extract important frames or segments from long videos into shorter summaries,enabling more effective and efficient video browsing and storing.In recent years,video summarization techniques have experienced great development,but it is still difficult to design a general algorithm to generate video summaries because of the complexity and vari-ability of video content.And,how to extract meaningful and interesting video summaries is still a big challenge.Therefore,this paper conducts research on various types of video,and proposes various video summary methods to improve the performance of video summarization algorithms and generate video summaries that are of interest to users.Specifically,it includes the following four aspects1)A key frame extraction method based on sparse selection and hierarchical clustering is proposed.Traditional methods directly use sparse selection model to extract key frames.How-ever,ensuring the sparsity of dictionary is difficult,which result in a large number of redundant frames in key frames.In this paper,the sparse selection model is first applied to the original video to select a large number of candidate key frames,rather than the most representative key frames,thus overcoming the difficultly in ensuring the sparsity of dictionary.Second,by cal-culating perceptual hash-based mutual information as the similarity measure,the candidate key frames are subjected to an improved hierarchically clustering algorithm that removes redundant frames and extracts the most representative key frames as video summary more accurately2)A video summarization method based on content importance features is proposed.Work 1)focuses on that the extracted key frames can greatly represent the original video,but ignores the video content that users are interested in.In this chapter,in order to identify the most inter-esting video content,two high-level semantic features related to the interestingness of images:memorability and video snap point,are used as content importance features to evaluate the im-portance of video frames.First,a key frame extraction method based on image memorability and entropy is proposed to ensure that the extracted video summaries are interesting and di-verse.Secondly,for unedited user videos,a dynamic user video summarization method based on memorability,video snap point and motion information is proposed.By training a linear regression model to fuse the three features,it predicts the importance of video segments and extracts the video segments that contain interesting frames or moving objects to combine into a short video.Compared with key frames,the dynamic video summaries are presented to users as a condensed video,which not only retains the video content that users are most interested in,but also better expresses the dynamic semantics of the video.The experimental results on common datasets such as YouTube and SumMe demonstrate that the proposed method achieves better results and generates video summaries that more satisfy the users' preferences.3)A dynamic video summarization method based on web image prior and deep ranking is proposed.As the length of video is increasing,the content of user video is becoming more complex and variable.In order to more effectively identify the interesting segments that users prefer,a deep network instead of the artificially defined high-level semantic features in Work 2)is proposed for predicting the importance of video frames.Considering that web images have been carefully selected and uploaded for online sharing by users,they contain what is worth tak-ing.We collects a large number of web images related to video content as priors,and proposes a deep ranking network with improved triplet loss to learn the importance relationship between"interest" and "non-interest" pictures.Then,the trained ranking model is used to determine what users are interested in and extract dynamic video summaries.Experimental results show that the improved triplet deep ranking model solves the convergence problem of existing meth-ods.At the same time,compared with the methods that use artificial features to calculate the similarity between video f-rames and web images,the proposed method using a deep network to mine web images can understand users'judgment on certain things and thus select important video segments more accurately4)A compact and rich key frame extraction method is proposed.Although Work 2)uti-lizes memorability and entropy to maximize the interestingness and richness of the extracted key frames,such key frames easily lose the dynamic semantic content of the original video.Com-pared with key frames,dynamic video summaries can contain video dynamic semantic content,but cannot visually present video content to viewers,requiring viewers to play and browse to get video content.In order to generate static video summaries that contain video dynamic se-mantic information and can be presented to viewers in a simple and intuitive manner,this paper proposes to generate a compact and rich key frame from each video shot to optimally represent all dynamic information of a shot with fixed background.After using the perceptual hash-based mutual information to segment the original video into multiple shots,this method detects,seg-ments and selects the optimal moving objects in each shot.Finally,using KNN matting,the selected moving objects in each shot are stitched into one frame as a key frame.Compared with existing methods,the proposed method compresses the content of a shot scene onto a key frame,and generates a compact and rich key frame to better express the video semantic content.
Keywords/Search Tags:Key frame extraction, Dynamic user video summarization, Sparse selection, Hierarchically clustering, Perceptual hashing-based mutual information, Entropy, Memorability, Video snap point, Web image, Deep ranking network, Optimal selection
PDF Full Text Request
Related items