Font Size: a A A

Multi-modal Human Action Recognition Based On Deep Learning

Posted on:2022-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y R ChenFull Text:PDF
GTID:2518306569498444Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the great success of deep learning methods in various fields,artificial intelligence has gradually moved from laboratory exhibits to practical applications in various scenarios.Human behavior recognition is one of the main research directions in the field of artificial intelligence in recent years,and its applications are also very extensive.In industrial scenarios,behavior recognition can be used to supervise the actions of workers,prompting them when they have illegal actions or operating errors,and so on.At present,researchers have achieved certain results in the field of human behavior recognition,but their methods are not well applicable in the assembly scene.Therefore,this article proposes a solution to the behavior recognition problem in the assembly process for this application scenario.The solution uses data from two modalities of video and inertial sensors.The algorithm is divided into two main parts: feature extraction and recognition.The recognition method adopts frame-wise recognition to maximize the accuracy of recognition.This article designs a collection system for collecting two modal data.Using an integrated inertial sensor with a USB camera can achieve long-term and stable data collection,and can realize the synchronization of multi-modal data while collecting,and after the collection is completed,data annotation and data set production are carried out.Aiming at the application scenarios of assembly and the characteristics of video and IMU modal data,in the feature extraction stage,this article uses two different feature extraction algorithms.The feature extraction of video data adopts the I3 D algorithm composed of 3D convolution modules,and uses the sliding window method to perform feature extraction one time series at a time,and then superimpose on the time series dimension.IMU data uses a feature extraction network composed of 1D convolution modules because of its low dimensionality,which is more efficient than ordinary convolutional networks.In order to realize the classification of multi-modal features,this article proposes a feature fusion method based on attention mechanism and a multi-level time series volume integration network.The classification network part is mainly composed of a 1D convolution module with a convolution kernel of 1.By adding a hole convolution and other methods to extract timing non-neighborhood features,the recognition effect of the algorithm is improved.In order to verify the proposed recognition algorithm,this article uses a self-made data set and a public data set to conduct two sets of comparative experiments.First,use the algorithm of this article to identify the public data set UTD-MHAD,and compare the results with previous research results;second,use the characteristics of different modalities of the self-made data set to identify,verify that the multimodality is in this article The effectiveness of behavior recognition.The data set collection system designed in this article can efficiently realize multi-modal data collection and synchronization,and the proposed behavior recognition algorithm can realize the frame-by-frame recognition of multi-modal behavior data with high accuracy.In actual scenarios,it can be used in the manual assembly process of the factory and cooperate with anomaly detection to improve assembly efficiency.
Keywords/Search Tags:deep learning, multi-modal, action recognition, dataset collection, feature extraction
PDF Full Text Request
Related items