Compressed Video Classification Based On Two-stream Convolutional Neural Network

Posted on:2022-08-03

Degree:Master

Type:Thesis

Country:China

Candidate:Y Wu

Full Text:PDF

GTID:2518306509454684

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Currently,traditional video classification methods are based on hand-crafted features,which achieved a relatively good performance in early tasks.But the traditional approach relies heavily on algorithms for extracted features and task-specific knowledge,so there has been a gradual transition to a deep learning-based approach.The traditional deep learning algorithm to solve this problem is based on a two-stream convolutional neural network structure.This structure achieves better performance in action recognition,which divides the network into spatial and temporal streams,using video frame images and dense optical streams in the network,respectively.However,this method has some drawbacks,i.e.,using dense optical flow as the input of the temporal stream,which is computationally expensive and extremely time-consuming for the current extraction algorithm and cannot meet the requirements of real-time tasks.In addition,the networks are trained independently and fused only in the final prediction score stage without considering the connection between the temporal and spatial streams.The details are as follows:(1)In this thesis,the optical flow is avoided in the temporal stream,and the Motion Vectors(MVs)extracted from the compressed domain are used as temporal features,which greatly reduces the extraction time.In addition,some popular algorithms also use MVs as features,but only the original MVs are used which results in lower accuracy.In this thesis,we propose a motion enhancement strategy based on the motion field accumulation of compressed video to improve its performance.And the temporal information of accumulated MVs is enhanced,i.e.,the motion information and continuity of the MVs are enhanced.(2)In this thesis,a fusion strategy based on different temporal resolutions is proposed,i.e.,different temporal resolutions are used for spatial and temporal streams to learn each stream feature more specifically.Besides,temporal stream features at different stages are fused into spatial stream features in the network structure to obtain more effective features.(3)The motion enhancement model based on spatial high-level features is proposed.The main idea of this strategy is based on knowledge distillation,which fuses the spatial information into the temporal stream so that more information is obtainable.Experimental results show that the accuracy of MVs can be greatly improved by the strategies proposed in this thesis and the final recognition accuracy is guaranteed without using optical flow.

Keywords/Search Tags:

motion vectors, two streams, knowledge distillation, tempotal resolution

PDF Full Text Request

Related items

1	Unsupervised Real Image Super-resolution Via Knowledge Distillation Network
2	Research On Lightweight Traffic Classification Based On Knowledge Distillation
3	Pyramid GAN Network And Lightweight Network Knowledge Distillation For Image Super-Resolution
4	Research On Stage-by-Stage Knowledge Distillation And Assistant Model Based Knowledge Distillation
5	Research And Application Of Lightweight Neural Network Based On Knowledge Distillation
6	Research Of Steganography Algorithm Based On Motion Vectors Of H.264
7	Research On Video Super-resolution Technology
8	Knowledge Distillation Algorithm Research For Orthogonal Features
9	Low Resolution Face Recognition With Deep Learning
10	Research On Knowledge Distillation Algorithm Based On YOLOv3