Font Size: a A A

Network User Video Summarization Technology In Complex Scenes

Posted on:2022-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y WangFull Text:PDF
GTID:2518306533479584Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the development of multimedia technology,video has become a valuable information resource.Due to the popularity of a large number of equipment with shooting function,the amount of video data is growing exponentially.But the huge amount of data makes the cost of video storage,retrieval and dissemination very high.Therefore,the technology of fast browsing and querying video has become one of the focuses of scholars.The traditional video summarization technology is mainly aimed at some structured videos such as movies and news,but network user video have complex scenes,shot boundaries and flooded with a large number of meaningless frames.This makes the performance of traditional video summarization technology in network user video summarization task not ideal.In order to generate a video summary that can represent the original video content and have fewer redundant frames,the video summarization algorithm is studied as follows:(1)With the development of deep learning technology,video summarization based on long short-term memory network(LSTM)has achieved certain results.Although LSTM is effective in modeling video temporal information,it cannot clearly assign different weights to different frames in the input video sequence.In this case,all frames of the input video sequence have the same importance.To this end,this thesis uses an auto encoder framework that introduces an attention mechanism to solve this problem.The encoder uses Bi LSTM to capture the context information of video sequence,and the decoder introduces attention mechanism.At the same time,the frame selector is improved,and the decoder hidden layer state and video context vector are used to predict the importance score of the corresponding frame of the input video sequence.(2)In this thesis,a video summarization model based on unsupervised learning is proposed,considering that the annotated video data sets are few and the production cost is high.The model uses Generative Adversarial Network as framework,in which the generator uses the auto encoder network with attention mechanism to generate the reconstructed features of the input video.The discriminator is essentially a binary classifier to determine the input video feature category.In this thesis,the objective function of the model is designed for training according to the task objectives of Generate Adversarial Network and video summarization.Compared with the above supervised learning method,the model uses the OVP and You Tube datasets.In order to verify the performance of this model,this thesis uses TVSum and Sum Me datasets for experiments.The comparison results show that the two models designed in this thesis are better than other algorithms of the same type,and our models can generate representative and low redundancy video summary.
Keywords/Search Tags:Video summarization, Auto Encoder, Attention Mechanism, Generate Adversarial Network, Unsupervised learning
PDF Full Text Request
Related items