Font Size: a A A

Research On Group Behavior Recognition Based On Multi-stream Architecture And Long Short-term Memory Network

Posted on:2021-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:X Y HuFull Text:PDF
GTID:2428330611488435Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Group behavior recognition in video is a challenging task and has become a research hotspot in the field of computer vision.Group behavior has a more complex structure than single-person behavior.Interferences,occlusions,and interactions between people within a group will affect our final recognition results.Therefore,we cannot directly combine Single-person behavior recognition technology in group behavior recognition.At present,there are two main difficulties in group behavior recognition.One is how to use multiple visual clues in complex scenes and fuse them to obtain more distinguishing features.The second is how to model the situational characters in the group to obtain long-term temporal contextual relationships between frames.However,most of the previous methods are unable to provide a practical solution to solve the both problems.Therefore,this paper proposes a context modeling framework based on the two-way TSN network(Temporal Segment Networks)architecture and LSTM network(Long Short-Term Memory Networks)to solve both problems at the same time.For the former,the multiple visual cues in the video are used which are not only appearance characteristics but also motion characteristics.In order to capture optical flow information of people and scenes in the video,this paper used a two-stream convolutional neural network TSN network to deal with this problem,and extend theframework to deal with the problem of group behavior recognition.Inspired by traditional global and local feature extraction methods,while paying attention to local information,more attention is paid to the effectiveness of global features.In order to correctly understand group behavior,based on single-person behavior recognition technology,this article eliminates the interference of irrelevant people on our recognition,and extracts the appearance information of main characters and scenes.In order to solve the occlusion problem between people,the motion information of the main characters and the entire scene are also taken into account.Feature extraction is performed through two TSN networks,one local TSN network extracts local feature representations,the other global TSN network extracts global feature representations,and then local and global appearance and motion features are combined to obtain more distinguishing features.For the latter,by taking advantages of time sampling of TSN and time sequence modeling of LSTM network,the long-term time series dependency in the video is captured to generate a comprehensive feature representation of the context for group behavior recognition.Then the classification results of the two softmax layers are fused to realize group behavior recognition.Finally,the algorithms are verified and analyzed on two group behavior data sets of CAD1 and CAD2,and achieved 93.2% and 95.7% average recognition rates respectively.Compared with the traditional group behavior recognition method,the model proposed in this paper has been greatly improved.Compared with the current mainstream group behavior recognition methods,it also shows better performance,which proves the effectiveness and stability of the algorithm in this paper.
Keywords/Search Tags:group behavior recognition, fusion of multiple visual cues, global-local model, interactive context modeling, long short-term memory network
PDF Full Text Request
Related items