Font Size: a A A

Research Of Video Spatio-temporal Feature Extraction And Retrieval Algorithm Based On Deep Learning

Posted on:2021-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChengFull Text:PDF
GTID:2518306545457404Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
The rapid development of Internet technology has created favorable conditions for the production and dissemination of videos.For mass video data,the traditional video retrieval method based on text can no longer meet the needs of users.In order to quickly and efficiently analyze and manage videos,content-based video retrieval technology has become a hot topic of current research.Most current video retrieval systems based on content features use the underlying features such as color,texture,and shape of key frames of the video to perform video similarity matching.These low-level features are feature extraction methods based on global statistics or artificial design.They have problems such as weak generalization and abstraction,and susceptibility to light and noise.At the same time,the correlation of video frame timing is not considered,which leads to low video retrieval accuracy.To this end,this paper proposes a key frame extraction algorithm based on depth feature,and a spatio-temporal feature extraction framework based on 3D convolutional neural network,and designs a video similarity matching rule that combines video space-time features and depth features.Finally,a prototype of a video retrieval system based on deep learning was developed and implemented,which achieved good results in terms of retrieval speed and accuracy.The main research contents of this article are as follows:(1)A key frame extraction algorithm based on depth feature is proposed.This method uses a convolutional neural network to obtain the depth features of video frames,and compares the differences between the depth features as the basis for key frame extraction.Data experiments show that the algorithm is simple,efficient,and robust,and can effectively avoid missed and false detection of key frames.(2)A space-time feature extraction framework based on 3D convolutional neural network is given.The 3D convolutional neural network is used to extract the spatiotemporal features of the 16 adjacent frames of the key frame.The spatiotemporal features can fuse the motion characteristics of the video frames and the frame content characteristics at the time sequence,fully reflect the change rule of the frame in time sequence,and extract the spatio-temporal features of the motion shot are more representative,which can further improve the accuracy of video retrieval.(3)A video similarity matching rule that combines the spatio-temporal and depth features of the video is established.In the process of video similarity matching,asymmetrical matching of video feature sequences is achieved by fusing static features(depth features)and dynamic features(spatial-temporal features)of key frames of the video,and combining sliding window.Thus,video content retrieval based on key frames and short videos is completed.Combining the three-part algorithm proposed in this paper: key frame extraction based on depth features,and a spatio-temporal feature extraction framework based on3 D convolutional neural networks,and a video similarity matching rule that fuse video temporal and spatial features and depth features,develop and implement a set of high precision prototype of video retrieval system based on content features.Tests on UCF-101,a standard action video data set,show that when the recall rate of a video retrieval system reaches 90%,the average precision rate is higher than 84%,which exceeds the 62% accuracy rate of the SIFT(Scale-invariant feature transform,SIFT)feature algorithm.And through the acceleration of hardware environments such as GPU,the average retrieval time of the system does not exceed 3 seconds,and its retrieval accuracy and speed are better than traditional video retrieval methods.It has made useful explorations for the application of deep learning to video retrieval research.
Keywords/Search Tags:deep learning, video retrieval, key frame extraction, spatio-temporal feature, similarity matching
PDF Full Text Request
Related items