Font Size: a A A

Research On Algorithm Of Zero-Shot Action Recognition Based On Effective Visual Features

Posted on:2022-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:C QiFull Text:PDF
GTID:2558307154974939Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet and deep learning,the amount of multimedia data such as videos and audio keeps growing,which drives the development of video classification tasks based on deep learning.The research on supervised action recognition tasks has achieved fantastic performance.However,collecting and annotating video data is laborious and costly.Moreover,the supervised model could not recognize unseen categories effectively.Zero-Shot Action Recognition(ZSAR)is proposed to break the above limitations,which aims to learn mapping functions between visual features and semantic features based on seen data.The unseen video data are classified by employing the mapping functions.However,most researches on ZSAR often fail to highlight salient information of the video sequence,which is caused by redundancy and affects the performance.Moreover,there is no unified structured guidance framework for inference mechanisms of ZSAR.In this thesis,we utilize the Energy-Based Model(EBM)as the guidance framework for the inference mechanism of ZSAR.Under the guidance of the EBM,the thesis proposes two algorithms to implement ZSAR based on the spatial saliency and the semantic saliency,respectively.The algorithm based on spatial saliency combines Dynamics Sampling Algorithm and Saliency Detection Mechanism to construct visual space.Based on the Histogram of Oriented Gradient(HOG)features of frames,the former selects dissimilar frames as frame subset.The latter module obtains temporal segments of high saliency as effective visual features.Besides,to avoid the semantic loss in the process of constructing visual space,we propose the algorithm based on semantic saliency,which employs the Video Temporal Summarized Module.Based on the deep features of frames,the module selects frame subsets in an unsupervised way.Combined with Saliency Detection Mechanism,the visual space can be established.In the meanwhile,Semantic Encoding Module takes category names as input and generates category semantic features as output variables under the guidance of EBM.Moreover,the Energy-based Matching Module is designed to measure the compatibility between the two visual features and the output variables,resepectively.We conduct experiments on the HMDB51 and UCF101 datasets to evaluate the performance.The results indicate that the algorithm based on spatial saliency achieves comparable results among similar methods.Moreover,the algorithm based on semantic saliency achieves the state-of-theart performance among similar methods.The effective visual features proposed and structural guidance of EBM could be employed in other frameworks of ZSAR,which gives new inspiration to the research of ZSAR.In future work,we will consider extracting effective visual features based on other attributes for better performance of ZSAR under the structural guidance of EBM.
Keywords/Search Tags:Zero-Shot Action Recognition, Spatial Saliency, Semantic Saliency, Energy-Based Model
PDF Full Text Request
Related items