Font Size: a A A

Research On Behavior Recognition Base On Multi-channel Spatio-temporal Eigenflow CNN-LSTM Model

Posted on:2021-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:H C KangFull Text:PDF
GTID:2428330623973465Subject:Safety engineering
Abstract/Summary:PDF Full Text Request
Human unsafe behavior is one of the main reasons for accidents in the production process.Traditional video surveillance systems cannot rely on human resources to detect the unsafe behavior of workers in a timely and effective manner.Using computer vision-based behavior recognition technology can automatically and non-contactly Identify the behavior of operators in the surveillance video,improve the efficiency and accuracy of the video surveillance system,and reduce accidents caused by human error.Existing computer vision-based behavior recognition methods are insufficient to extract both the spatial and temporal characteristics of behavioral videos,making it difficult to make full use of the spatial appearance information or inter-temporal information of behavioral videos.Video frames and optical flow diagrams.In order to improve the accuracy of existing behavior recognition methods,this dissertation chooses a deep learning-based method to obtain stronger feature extraction and model generalization capabilities,and uses the CNN-LSTM model to model the visual appearance information and temporal relationship of video behavior.By analyzing the related literature on CNN-LSTM behavior recognition,two research ideas for improving the accuracy of behavior recognition were determined,namely the input data type and the model's spatiotemporal modeling capabilities.Based on this,the following were studied:Behavior recognition model based on video human skeleton picture and CNN-LSTM.In order to add new modalities that can complement the input data information of existing models,this dissertation extracts human behavior skeleton diagrams in the original video frames to characterize the changes in human pose and motion in the video;this dissertation uses the CNN-LSTM model to simultaneously utilize Intra-frame visual spatial information and inter-frame temporal information of the skeletal diagram,in order to more effectively extract the spatial characteristics of the behavioral video,inception V3 is used to replace theconvolutional neural network used in the existing method,which indirectly increases the overall performance of the CNN-LSTM model.Behavior recognition based on multi-channel CNN-LSTM fusion model.In order to complement the effective information of the original video frames,optical flow diagrams,and human skeleton diagrams,a multi-modal multi-channel CNN-LSTM model is established,and multiple model post-fusion strategies such as weighted fusion and adaptive fusion are used to make the multi-mode The state input data can complement the information and improve the accuracy of behavior recognition.Finally,experiments were performed on the Caffe deep learning modeling platform for the feature extraction performance of inception V3,the recognition performance of the CNN-LSTM model for the skeleton graph,and the recognition performance of the multimodal CNN-LSTM model.Experiments show that the inception V3 network can effectively improve the spatial feature extraction capability of behavioral videos,the CNN-LSTM model can effectively identify behavioral videos based on skeleton diagrams,and adding a multimodal CNN-LSTM model of skeleton diagrams can improve the existing CNN-LSTM model Accuracy of behavior recognition.
Keywords/Search Tags:action recognition, computer vision, CNN-LSTM, human skeleton diagram, multimodal fusion
PDF Full Text Request
Related items