Font Size: a A A

Research On Human Action Recognition Algorithm Based On Machine Vision

Posted on:2021-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:J H YuFull Text:PDF
GTID:2428330602479269Subject:Detection Technology and Automation
Abstract/Summary:PDF Full Text Request
Human action recognition which aims to recognize activity from a continuous video is a key technology in human-robot interaction,and it has shown superior performance in applications such as monitoring security,medical care,and game entertainment.Methods based on CNN and RNN are the main direction,including 2D and 3D networks.Although the application of 2D CNN is better,from the perspective of research,3D CNN has more exploration space and is the hotspot in the future.In traditional research,2D CNN is used to process RGB data sets.Although this method performs well,there are many deficiencies.For example,the recognition accuracy is low under changes in illumination intensity and occlusion;and the long-term continuous action recognition is not effective;the RGB data set cannot represent the increasingly complicated practical application.In recent years,researchers have proposed human action recognition algorithms based on 3D CNN and LSTM,which improves the performance of the algorithm.However,the complex framework of the 3D CNN leads to too many parameters,and the training effect is not good.The traditional LSTM cannot extract and train the video frames in a targeted manner.In addition,although the existing RGB-D public data set is relatively large in scale,it has some differences from the real scene,and the data is not well preprocessed,and the algorithm cannot be efficiently trained.In order to solve the above problems,the idea of dense connection and attention mechanism is introduced in the paper to improve 3D CNN and LSTM respectively.On this basis,a new fusion model for recognizing RGB-D data sets is proposed.The algorithm is almost immune to illumination and occlusion and has a high recognition rate in different complex environments.It also optimizes the network structure and improves the efficiency of parameter usage.The RGB-D data set of the real scene is also self-made,and a better preprocessing method is given.The main contributions are summarized below.Firstly,the two-channel 3D CNN is used to extract the RGB and Depth features,and the dense connection is introduced to realize the parameter sharing,which improves the training effect and feature extraction performance.A new experimental method for selecting 3D convolution kernels is proposed.On this basis,a real-time feature fusion method is adopted,which takes into account the commonality of the two modal features and obtains more effective features.Secondly,a soft attention mechanism is introduced in the LSTM,and each element in the input feature vector is assigned a corresponding weight so that the network can learn each video frame in a targeted manner.This method can remove redundant information and improve the processing ability for the global long-term feature.Combined with the local short-time feature processing of 3D CNN,the proposed algorithm has superior temporal information processing performance.Improves the recognition rate for complex and similar behaviors.Finally,in order to better train the network of this article,the RGB-D human action data set was built.Multiple experiments were performed on a self-built dataset,SBUKinect dataset,and MSR-action-3D dataset.This includes testing a variety of traditional algorithms and state-of-the-art approaches in a self-built data set and testing these approaches using public data sets.Detailed analysis and comparison of experimental results verify the correctness and effectiveness of the proposed algorithm.
Keywords/Search Tags:Human action recognition, Deep learning, RGB-D dataset, Dense connections, Soft attention mechanism
PDF Full Text Request
Related items