Font Size: a A A

Research And Implementation Of Video Summarization Method Based On Deep Learning

Posted on:2023-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:D XuFull Text:PDF
GTID:2568306836972089Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the arrival of 5G era,the continuous development of computer technology and digital video technology,the amount of video data is increasing day by day.Hence the need for a technology that can significantly shorten the length of a video while retaining the main content of the original video.The emergence of video summarization method successfully meets this need,it can reduce the time of viewing video,while saving a lot of storage space.However,there are many kinds of video resources at present,and each kind of video has its own characteristics.For example,for movies and TV dramas,what is important is the plot,while for surveillance video,what matters is the target in the video,which poses a huge challenge to the video summarization technology.In order to achieve better video summary effect,corresponding methods should be adopted for different types of videos.Therefore,this thesis divides a wide variety of videos into surveillance videos and non-surveillance videos.Among them,surveillance video is used to record the events that happen in the environment,and most of the recorded content does not have a story;While non-surveillance videos are mainly produced in entertainment scenes,most of the recorded content has a story.Dynamic video summarization algorithm can be used to deal with non-surveillance videos and can ensure the integrity of the plot under the condition of video compression.The video concentration system can process the surveillance video,shorten the video duration and ensure that the target object in the video is not lost.This thesis focuses on the research of dynamic video summarization algorithm and video concentration system,mainly including the following three parts.(1)For dynamic video summarization,a two-stage dynamic video summarization method based on Anchor-based is proposed to solve the problem of insufficient information utilization between frames in video summarization task.The network mainly includes feature extraction network,one-dimensional convolutional neural network and two-stage network.The feature extraction network is responsible for extracting the image features of each frame.One-dimensional convolutional neural network uses one-dimensional volume operation to extract the information between frames effectively.The two-stage network structure can effectively reduce the amount of computation in the second stage through the processing in the first stage.In the second stage,the results of the first stage are more detailed regression and classification,and more accurate lens position and lens score are output.At the same time,the anchor mechanism is used in the training of this network.Among them,single-scale anchor is used in the first stage,and multi-scale anchor is used in the second stage.The final output of the network is realized through the regression of anchor,which greatly reduces the difficulty of optimization.(2)For dynamic video summarization,a method of dynamic video summarization integrating self-attention mechanism is proposed,which solves the problem that one-dimensional convolutional neural network can only extract local features in the two-stage anchor based neural network.One-dimensional convolutional neural network continuously the depth of the overlay network can be used to obtain the information between frame and frame,it is essentially a local operation,only partial information,but the attention mechanism to long-range operation,do not need to use external information,based on the data sequence can obtain internal correlation,The information between frames in a video is obtained more efficiently.Therefore,by using the self-attention mechanism,the network can understand the video more fully,and no additional parameters of the network need to be added.(3)For video concentration method,the steps of this method are designed and implemented,including the following steps: motion track extraction,motion track processing and motion track fusion,which solves the long practical problems about surveillance video.In this method,the motion track of the target object in the video is obtained through target detection and tracking algorithm in the motion track extraction step.In the process of trajectory processing,the trajectory of the target object is rearranged and the redundant information is filtered out by semantic segmentation algorithm.In the motion track fusion step,background modeling is used to obtain the video background,and then the rearranged track is fused with the background to obtain the final condensed video.
Keywords/Search Tags:dynamic video summarization, two-stage network, anchor-based, video enrichment, object tracking, compressed video, Background modeling
PDF Full Text Request
Related items