Font Size: a A A

Research On Representation Learning For Early Action Prediction

Posted on:2024-04-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:W Q WangFull Text:PDF
GTID:1528307202994869Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
In the field of computer vision,research based on human behavior has always been a focus of attention for researchers.Different from traditional human action recognition tasks,early action prediction aims to recognize action categories before they are fully conveyed.As one of the important research topics in understanding human intentions,early action prediction enables the prediction of action categories before the actions are fully conveyed.It has significant theoretical research and practical application value in various areas,such as human-computer interaction,human-machine collaboration systems,intelligent video surveillance systems,autonomous driving,and driver assistance systems.However,early human behavior prediction is more challenging because only the initial part of an action can be observed,and the complete execution process is not available.The main difficulties of this task include:the lack of global information of complete actions in the input action video sequences,the information gap between video data with different observation ratios,and the high similarity between different actions in the early stages.These issues seriously affect the accuracy of early action prediction methods.To address the above challenges,this study focuses on representation learning-based methods for early action prediction from the perspective of feature expression.The main contributions and innovations are summarized as follows:1.To address the problem of input action video sequences lacking global information about complete actions due to the observation of only the initial part of the actions,we propose a guidance-aware approach for early action prediction.The key contributions are as follows:(1)We propose a novel Guidance Aware Network(GA-Net)that effectively utilizes the role of global guidance information to improve the network performance.(2)A Guided Metric Learning Module(GMLM)that guides the global features is proposed to enhance the discriminative ability of partially observable action features at the sample level.(3)A Distribution Alignment Module(DAM)is designed to align the partial action features and global action features at the distribution level,thereby guiding them in the high-level feature space.The experimental results conducted on multiple publicly available datasets demonstrate that the proposed method can effectively utilize the guidance of global information,leading to excellent prediction performance of the network model.Particularly,it enhances the prediction performance of the network on partially-observed action video sequences with low observation ratios.2.Due to the varying observation ratios in action video sequences,different actions possess different amounts of information,leading to a certain information gap between videos with different observation ratios.To address this problem,we propose a temporally-observed domain contrastive learning method based on conditional temporal observation domain.The main contributions are as follows:(1)The task is transformed by considering video sequences with different observation ratios as samples in different time observation domains.This transforms the information gap caused by different observation ratios into domain gaps between different observation domains,thereby decreasing the domain gap between low and high observation domains.(2)A novel Temporally-Observed Domain cOntrastive Network(TODONet)is introduced to decrease the domain gap between low and high observation domains,enhancing the discriminative information in low observation domain samples(i.e.,low observation ratio video data).(3)A novel conditional contrastive learning algorithm is proposed,which decouples category information and observation ratio information and reduces the distance between anchor samples and positive samples while increasing the distance between anchor samples and negative samples.This effectively reduces the domain gap between different observation domains.Experimental results on multiple publicly available datasets demonstrate the effectiveness of the proposed method in improving the prediction performance of the network.3.To address the problem of poor prediction performance due to significant similarities among different actions in the early stages,a meta negative sample learning-based method for early action prediction is proposed.The main contributions are as follows:(1)A novel Meta Negative Network(Magi-Net)is proposed,which is designed based on contrastive learning.It utilizes representation learning to further explore the effective information of features in the representation space.(2)New positive and negative sample collection modules are designed based on the task characteristics.The Observation Augmented Positive Sampling module and Meta Negative Sampling module are proposed to facilitate the collection process of positive and negative samples.(3)A new optimization strategy for meta negative sample learning,namely Meta Negative Sample Optimization Strategy(MetaSOS).is proposed.It utilizes metalearning to update the network parameters of Magi-Net.Experimental results on multiple publicly available datasets demonstrate that the proposed method effectively addresses the problem of difficult sample classification caused by sample similarities,thereby improving the recognition performance of the network.
Keywords/Search Tags:representation learning, metric learning, contrastive learning, action recognition, early action prediction, meta-learning
PDF Full Text Request
Related items