Font Size: a A A

Research On Video Content Interpretation Method Based On Deep Learning Technology

Posted on:2024-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:W J LiuFull Text:PDF
GTID:2568307112457864Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of multimedia technology,the amount of video data is growing explosively.Instead of focusing on the whole video,people focus on the key information.At present,the research of video title still focuses on the description of short videos and video clips.These studies usually use convolutional neural networks to extract video image content or features,and then use cyclic neural networks to generate natural language.However,long video monitoring data is large and the content is relatively stable,retrieving and watching these videos is a tedious and tedious work,so how to locate the key events and interpret the long video content has become the focus of research.Aiming at the problem that the existing short video description model based on encoding and decoding can not be directly applied to long video and the poor video description problems of video description methods such as long video,insufficient visual feature extraction ability or insufficient key segment capture ability under frequent video scene switching,a video caption method based on super frame cutting long video is proposed.First of all,a super frame extraction algorithm is proposed to calculate the proportion of key video time to meet the video browsing time limit and shorten the video retrieval time.A two-layer filtering model is constructed to adaptively extract super frames,filter redundant key frames,and perform multi-scene semantic description.Embed the reserved superframe into the surrounding frame to build multiple key video clips.Secondly,in the coding and decoding model,we use the deep network model and small convolution kernel to pool the sampling domain to obtain more video features and introduce the attention mechanism.It overcomes the difficulty that the classic video title method cannot be directly used to process long videos.Finally,the long and short term memory model is used to replace the cyclic neural network decoding to generate the video title,and the segmented interpretation information of the video content is given.It is tested on You Tube dataset video,synthetic video and long surveillance video,and uses a variety of machine translation indicators to evaluate the performance of the method.The experimental results show that this method can obtain better segment description when dealing with the challenges of frequent scene switching and long video.
Keywords/Search Tags:Super frame cutting, Time ratio, Multi-scene semantics, Attention mechanism, Video title
PDF Full Text Request
Related items