Font Size: a A A

Deep Networks For Video Recognition

Posted on:2021-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:W H WuFull Text:PDF
GTID:2428330623465014Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Video recognition has attracted great research interest in the computer vision community,due to its importance in real-world applications such as video surveillance,video search,and video recommendation.With the rapid development of the Internet and mobile devices,video data has exploded over the past years.Huge video information has far exceeded the processing capacity of the conventional manual system.Therefore,video understanding technology is widely needed in the industry and video recognition has also become one of the most active research topics.Untrimmed videos pose a critical challenge for recognition since not all the frames consistently respond to the specified ground-truth label.To this end,we propose a multi-agent reinforcement learning framework for discriminative frame sampling.Our single RGB model achieves a comparable performance of ActivityNet v1.3 champion submission with multi-modal multi-model fusion and new state-of-the-art results on YouTube Birds and YouTube Cars.Meanwhile,many research efforts focus on the speed-accuracy tradeoffs in trimmed video recognition.Unlike these works which mainly focus on designing efficient network architectures,we propose a novel idea of dynamic inference to improve video recognition efficiency by leveraging the variation in the distinguishability of different videos.Extensive experiments are conducted on several popular trimmed video datasets.Results verify that our solution can significantly reduce the computation cost while maintaining excellent recognition accuracy,showing the superiority of dynamic inference for video recognition.
Keywords/Search Tags:Video Understanding, Pattern Recognition, Deep Learning
PDF Full Text Request
Related items