| With the widespread availability of low-cost camera recorders and the advance of Internet,video is becoming an important way for people to record their everyday lives and communicate with each other.A large amount of videos are generated every day.These videos cover many different genres,including news,sports,TV serials,shows and self-recorded lives.Such huge amount of videos are difficult to be fully viewed and places tremendous storage burdens for video servers and websites.Therefore people desperately expect to extract only the important contents of videos for fast viewing and efficient storage.Video summarization is exactly one method to satisfy this requirement.Although many progresses regarding video summarization have been made,the current video summarization techniques are far from mature.This dissertation aims to further improve the video summarization performance by proposing some novel methods.This dissertation does in-depth research on the problems of video summarization.Nowadays,there are many different types of videos.Moreover,there may exist many different scenarios in a single video,and the scenarios vary a lot.Such strong diversity of videos places a strong demand on the adaptability of video summarization methods.More specifically,video summarization methods need to adaptively extract features,detect shot boundaries,find key frames and generate video summaries for different videos.This dissertation aims to satisfy these requirements.Based on the existing video summarization works,this dissertation combines the dictionary learning,sparse coding and deep learning to do in-depth research and propose some methods on feature extraction,shot boundary detection and video content importance evaluation.Moreover,our methods were tested through some standard datasets and thorough analyses were performed for the testing results.Here our main contributions are introduced.1)We propose a video summarization method based on graph-regularized sparse coding.Conventional methods usually use hand-crafted features to describe video contents and heavily rely on the specific domain knowledge.Due to the strong variation of video contents,such fixed feature extraction methods may not work well for all videos.To make the algorithm robust to the variation and ensure good performance for different videos,we implement the dictionary learning and sparse coding to adaptively learn feature space and extract features,according to the concerned video contents.Our feature extraction method can describe the video contents more accurately,and can greatly enhance the robustness of video summarization for various videos.2)We propose an adaptive video summarization method.After extracting features from video frames,a shot boundary detection method is applied to acquire the video structure information,which can be used as a reference for summarization.The existing methods detect video shot boundaries by first measuring the similarity between adjacent frames.Then a fixed threshold is used to detect shot boundary.However,due to the high variance of the video contents,a fixed threshold may not yield desirable accuracy for all videos because different videos have quite different contents and expect different "optimal" thresholds.To resolve such issue,we propose an adaptive video summarization method.The algorithm can adjust the shot boundary detection threshold according to the video contents.In that way,we enhance the adaptability of shot boundary detection,which is quite essential for the video summarization performance.3)We propose a method to summarize the video contents based on autoencoder.After detecting shot boundaries and acquiring the video structure information,we need to evaluate the importance of video contents,and select the most important parts to generate the video summary.Importance evaluation of video contents is crucial and complicated,because the importance evaluation results directly affect the quality of the video summary.At the same time,such importance evaluation is quite subjective,and abstract.Therefore,it is difficult to summarize such evaluation principle with a set of formulas.To accomplish this importance evaluation task,we use the concerned video title as search queries,retrieve relevant images from the Internet.Then an autoencoder is trained using those video frames and retrieved web images to learn the common pattern knowledge shared by these images and the concerned video.Finally,the trained autoencoder can evaluate the importance of video contents,and we can generate a video summary according to the importance information.By using deep networks to dig information from web images,our method can yield quite consistent importance evaluation as most people,which helps in improving the accuracy of the importance evaluation.4)Our proposed methods were tested through some standard datasets,such as VSUMM,Youtube and SumMe.The test results were thoroughly analyzed and confirm that our proposed methods can achieve better performance than conventional methods and yield a better video summary. |