| With the continuous development of information technology and artificial intelligence,manufacturing industry has ushered in a new round of technological reform.In manufacturing plants,employees complete various complex production tasks by operating machines or collaborating with automated robots.Whether the action sequences of these tasks are standardized or not will directly determine the production efficiency and product quality of the workshop.However,in the workshop with complex environment,occlusion,viewpoint change and light change pose great challenges to the capture of complete motion.Therefore,how to recognize and supervise the production behavior of employees in a complex environment has become a key issue of action recognition on the factory floor.At the same time,in some dangerous workshops,employee violations may affect the production process and even cause safety accidents.Therefore,real-time detection of employee violations and corresponding early warning are also crucial.And how to quickly and accurately detect employee violations has become another key issue of action recognition on the factory floor.To solve the above problems,this thesis proposes two different solutions,and the main research is as follows:(1)Aiming at the problem of employee action recognition in complex scenes,a packing behavior recognition method based on multi-view adaptive skeleton network was proposed.Firstly,the differential images are stacked as the input of the model,and the multi-view under complementary directions is combined to solve the problem of human occlusion.Then,the differential human skeleton under the two complementary views is passed into the adaptive view conversion module,the skeleton is rotated to obtain the best virtual viewing Angle,the skeleton is identified by the three-layer stacked LSTM network,and the classification scores under the two views are fused to obtain the result.In addition,in order to solve the problem of subtle action recognition,a local positioning image convolutional network combined with attention mechanism is used,and the captured hand image is passed into the ResNeXt network for recognition.Finally,the results of skeleton and local image recognition are fused to predict the behavior of workers.(2)For the identification of employee violations in dangerous scenarios,a behavior recognition model based on Transformer and optical flow is proposed.The model includes two different feature encoders and a feature fusion module.The first encoder is the spatial feature encoder,which takes the RGB video sequence as the input data and performs spatial encoding through a Transformer-based encoder to extract the spatial features of actions.The second encoder is the temporal feature encoder,which takes the optical flow image as the input of the network to extract long-short term temporal features.Then,in the feature fusion module,two attention enhancement mechanisms are used to enhance the spatial-temporal features respectively,and the spatial feature map and temporal feature map are fused to obtain the corresponding recognition results.Finally,experiments are carried out on the public data set and the actual industrial scene data set,which verify the effectiveness of the proposed method and meet the requirements of actual industrial production. |