Font Size: a A A

Research On Video Summary Generation Technology Based On Deep Learning

Posted on:2022-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:L T LiFull Text:PDF
GTID:2518306764488434Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the growth of mobile devices and the Internet,the output of video has exploded.Video including images and audio,large amount of information,to fast video browsing,retrieval,and surveillance video and other multimedia related industry brought huge challenges: on the one hand,users watch the original video,if you watch the full video requires a lot of time,drag the progress bar fast forward and easy to lose important content,on the other hand puts forward higher requirements on how to save the video data.Video abstract technology is put forward to a very good solve the above problem,the video summary technology based on video processing object,dismantling hierarchy analysis content at the same time,to find out the fragments contain important information,then these fragments into a short video,to sum up the main body of the original video content,in the field of video fast browsing and core content store provides a new way of thinking.In recent years,video summary technology has been developed rapidly,but most methods have poor adaptability in the face of complex and varied content and different structure of video,and there are obvious deficiencies.Therefore,this paper proposes three dynamic video summarization methods based on the shortcomings of the improved algorithm and the improvement of algorithm performance:(1)A video summary method based on improved bi-directional long short-term memory network is proposed.Video is different from the picture,it contains the time axis,the temporal information,a lot of traditional video summary method without considering the correlation between the frame according to the characteristics of the content access,often at the same time complexity is high,a fitting problem,so we put forward a network using bi-directional long short-term memory at the same time to optimize its features of video summary model.In the first step,depth features of images were extracted by VGG16.In order to make the final abstract as comprehensive as possible,the bi-directional long short-term memory network was used to transform the original feature recognition task into importance evaluation task,which reduced the amount of calculation.In the second step,the generated summary should also contain key information,so we added maximum pooling after bi-directional long short-term memory to reduce the dimension of features and better highlight key information,so that the model can learn important features more easily.The reduction of feature complexity also reduces the parameters required by the full connection layer and reduces the occurrence of over-fitting problems.The last step is to score the importance of the video frame,calculate the shot score according to the score,and select the key shot to generate the video summary.(2)A video summary method based on self-attention mechanism and random forest regression is proposed.The key task of video summary is to obtain key segments.Therefore,if the model can better learn important segments and reduce the impact of data fluctuation,the summary accuracy can be effectively improved.Therefore,based on encoding and decoding architecture,we introduce self-attention mechanism and random forest regression.Specifically,Goog Le Net firstly obtains image depth features,and then calculates feature weight through self-attention mechanism to improve the proportion of important features.The core of decoder is bi-directional long short-term memory network.At the same time,the prediction results are re-predicted by random forest,and the prediction results of the two are adjusted by weight parameters to effectively resist the fluctuation caused by inaccurate data prediction.Experimental results verify the effectiveness of our method and improve the stability and prediction accuracy.(3)Video summary technology is to extract the key frames or clips of the original video to generate a summary,which can greatly shorten the viewing time without losing the main content and achieve the effect of fast browsing.Most of the existing methods are only improved based on image features,ignoring the temporal sequence between images and the lack of autonomous learning ability of models.We propose a video summary network based on Transformer and deep reinforcement learning.The network takes Transformer's encoder-decoder as its main structure.The encoder is composed of self-attention and Feed Forward Neural Network modules in Transformer.Bi-directional long short-term memory and reinforcement learning are used to replace the encoder in Transformer.Experiments are carried out on two public standard data sets of video abstracts,and the experimental results prove the effectiveness of the proposed method.The encoder part of Transformer has excellent processing ability for image features,while bi-directional long short-term memory in the decoder has good decoding ability for time series data.The reinforcement learning strategy can optimize model parameters,improve model generalization ability,and make the generated abstract more representative and diverse.
Keywords/Search Tags:Video Summary, Bi-directional Long Short-Term Memory, Convolutional Neural Networks, Self Attention, Random Forest Regression, Transformer, Reinforcement Learning, Reward Function
PDF Full Text Request
Related items