Font Size: a A A

A Research On Weakly Supervised Learning For Video Segmentation And Action Recognition

Posted on:2021-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:C P GeFull Text:PDF
GTID:2428330623974821Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Action recognition and timing segmentation is a very important video understanding task,and it has been widely used in video surveillance,video retrieval,autonomous driving and other fields.The goal is not only to identify which actions are included in an untrimmed video,but also to predict the start and end times of each action in the video.Most of the existing action recognition algorithms are based on strong supervised learning of each frame of image labeling,and detailed labeling of videos requires a lot of manpower and material resources.Therefore,in order to reduce the workload of video labeling.In order to reduce the workload of video annotation,this paper conducts research on action recognition and temporal segmentation based on weak supervision.This paper focuses on two types of machine learning algorithms and current behavior recognition algorithms,and combines machine learning and behavior recognition algorithms to improve the accuracy of action classification and temporal segmentation.The main contents and innovations of this article include:1.In video action recognition and temporal segmentation,a common method is to imitate the task of detecting objects based on images,generate time proposal for action targets,and then learn time proposal fragments.However,the proposed learning method consumes a lot of computing resources,and the proposed method cannot be identified and located in real time in the surveillance video.This article adopts the method of segmenting the video,that is,segmenting the video into segments of the same size for feature extraction,and then performing weakly supervised classification learning on the features.This study does not need to provide action labels for each video,so it greatly reduces the workload of labeling.2.A action recognition and temporal segmentation algorithm combining selfpaced learning is proposed.The current process of action recognition and localization algorithm is usually to randomly select samples to predict the video according to the current network parameters,and then calculate the gradient by the defined loss function according to the prediction result,and then update the network parameters.However,this method usually requires a strong classifier,because a weak classifier will depend on the initial network parameters.In the learning process of a classifier,the simpler the sample,the easier it is to obtain the recognition of the classifier,and the better its stability.In this paper,the self-paced learning method is used to simulate the learning process of human cognitive machine from “simple” to “complex” learning process for weakly supervised action timing positioning research.3.A action recognition and temporal segmentation method for feature transformation is proposed.In the research of video representation,the self-supervised method has been successful,but the method of combining self-supervision in the task of weakly supervised motion recognition and localization has not been studied.In this paper,self-supervised learning of feature transformation is performed by using video features to perform transformation operations such as flipping and symmetry,and providing the transformation operation as a label to the neural network.Then,the self-supervised learning network parameters are loaded into the classification network as a pre-trained model for weakly supervised action recognition and temporal localization learning.4.This paper experimentally validates the two proposed algorithms on the human activity Thumos14 dataset.In the method based on self-paced learning,compared with the state-of-the-art algorithms,the accuracy of action location is improved by about1%,and the accuracy of action recognition is increased by 0.2%.In the algorithm combining feature transformation,the algorithm in this paper not only improves the accuracy of action positioning by about 3%,but also improves the accuracy of action recognition by 1.3%.
Keywords/Search Tags:Computer vision, Weakly supervised learning, Action recognition, Action location, Self paced learning, Feature transformation
PDF Full Text Request
Related items