Font Size: a A A

The Design And Implementation Of Few-shot Video Classification Based On Deep Learning Framework

Posted on:2022-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:J L HeFull Text:PDF
GTID:2518306341450604Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and mobile internet technology,watching and sharing videos has become a part of people's daily life,and video data became an important information carrier.Manually processing video data is obviously unrealistic.Using traditional neural network models requires a large amount of annotated data,and manual annotating data is time-consuming and laborious.Therefore,the problem of video classification in few-shot scenes has become a hot research problem in computer vision.The problem of few-shot video classification refers to completing the task of video classification with few labeled samples.This thesis mainly studies two basic tasks in video classification:action classification and background detection,and studies how to realize action classification and background detection when there are few video annotations.Video data naturally has spatial and temporal attributes.It is particularly important for video classification tasks to fully extract information in these two dimensions.Existing research works fail to consider the relative relationship and importance of frames in the video,so the temporal characteristics of the video cannot be fully extracted in a few-shot scene.Firstly,in order to solve the problem of insufficient feature extraction and utilization in few-shot action recognition scenarios,this thesis proposes spatial feature representation method based on the siamese network for the few-shot action classification problem,using the siamese network combined by ResNet-18 and AlexNet network to extract spatial characteristics of the video.Secondly,a video temporal feature extraction method based on sparse attention mechanism is proposed.The core idea of the method is to highlight the influence of key frames while calculating the relative relationship between video frames,so as to fully extract the temporal features of the video.Finally,based on the above-mentioned extracted features,a deep relationship module based on the alignment idea is proposed to make full use of the temporal and spatial features in the sample.Aiming at the problem of few-shot background detection,this thesis proposes a few-shot background detection algorithm based on siamese network.Experimental results on multiple real datasets show that the few-shot action recognition algorithm based on the sparse attention mechanism and the few-shot background detection algorithm based on the siamese network can make full use of the extracted features and significantly improve the accuracy of classification results.
Keywords/Search Tags:few shot learning, action recognition, siamese network, attention mechanism
PDF Full Text Request
Related items