Font Size: a A A

Collective Activity Recognition Based On Video Deep Reinforcement Learning

Posted on:2020-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2428330590495450Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Understanding the behavioral semantics of group characters in video is a difficult point in the field of artificial intelligence.The main task is to integrate serialized action cues to reason the behavioral semantics of video group characters using deep learning,reinforcement learning and other alogorithms.In recent years,the development of pattern recognition technology has been able to basically meet the recognition requirements of individual actions in images,but the research on collective activity recognition in video has yet to be developed.This thesis aims to identify the behavioral semantics of collective activity in video.Firstly,a feature extraction method based on key semantics is designed to extract multi-dimensional fusion features of video main content,and then a human detection method based on target candidate region is designed to be classified in parallel.And to locate the group characters in the video,and finally design a semantic extraction method based on space-time trajectory to complete the understanding of collective activity.The innovation of this thesis is mainly reflected in the following three aspects:(1)Using the hierarchical clustering results of video frames to select the clustering center,it uses the K-means algorithm to optimize the hierarchical clustering results,extracting the key semantic sequences of the video,and using the bidirectional feature processing channel to fuse the multi-level video features to complete the multi-dimensional fusion feature extraction.The key frame extraction experiment is carried out on the KTH dataset.The experimental results show that the key frame extraction algorithm based on video content designed in this thesis has a high key frame recall rate and can effectively focus on the key semantics of the video.The feature extraction experiment is carried out on the COCO dataset.The experimental results show that the video features extracted by the feature fusion algorithm based on convolutional neural network can effectively utilize the lowlevel location information and have stronger feature expression ability.(2)Using the duplicate removal network to deselect the candidate box,it combines the classification confidence score and the classification probability result to select the target candidate box,introducing the multi-task loss structure for training learning,and processing the category classification and position regression of the target bounding box in parallel.The target detection experiment is carried out on the COCO dataset.The experimental results show that the proposed target bounding box extraction algorithm based on multi-dimensional fusion feature can obtain better target detection effect and better return to the target position.The character positioning experiment is carried out on the Volleyball dataset.The experimental results show that the video group character localization algorithm based on character feature can accurately locate the person's position through one-stage adjustment and reduce the computational cost.(3)Using the mask position matching feature to match the characters between the frames,it constructs the behavioral spatio-temporal association model to extract the behavior semantics of the group characters in the video through the two-layer recurrent neural network.The semantic extraction experiment is carried out on the Volleyball dataset.The experimental results show that video group character trajectory tracking algorithm based on motion feature designed in this thesis can track the trajectory continuously and accurately,which is more suitable for group scenes.The experimental results show that the group character behavior association algorithm can effectively fuse space-time clues and has high semantic extraction accuracy.
Keywords/Search Tags:Deep Reinforcement Learning, Collective Activity Semantic Extraction, Multi-Target Tracking, Human Detection, Behavior Association
PDF Full Text Request
Related items