Font Size: a A A

Study Of Indoor Human Action Recognition Based On RGB-D Video

Posted on:2020-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:J H ZhangFull Text:PDF
GTID:2428330602950472Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
At present the aging of the population is a major social problem,especially the care of the elderly in daily life is a hot issue that needs urgent solution.Although countries have provided manual care to solve this problem,for a large country like China with more than 1.3 billion people,this traditional care is not enough to cope with huge social needs.A feasible way to solve this problem is to use human action recognition technology to identify the daily routine behavior of the elderly,so as to achieve automatic care for the elderly,thereby reducing the social burden.In human action recognition,feature extraction and classification are the two most important steps.Most of the existing feature extraction technologies are based on RGB images,but they are affected by illumination and complex background,resulting in low accuracy of action recognition.With the release of Kinect,researchers began using the RGBD data collected by Kinect for feature extraction.Compared to RGB images,depth images and skeletal data are not sensitive to changes in lighting,complex motion backgrounds and so on.Therefore,in order to improve the accuracy of action recognition,this thesis proposes a human action recognition algorithm based on RGBD video combined with depth images and skeletal data.The main research contents of this thesis are as follows.1.For the depth image sequence captured by Kinect,this thesis first studies the DMM(Depth Motion Maps)of the depth image.DMM is based on the difference of the whole depth image sequence,which lacks the temporal information of the action.In order to improve such problems,this thesis proposes a WDMM(weighted depth motion map)combined with depth image keyframes,which improves the lack of temporal information of DMM.Then,LBP(Local Binary Pattern)features are extracted from the WDMM,and the depth image features are obtained.2.For the skeletal data collected by Kinect,this thesis extracts the skeletal posture features and local features and multi-time scale displacement features.The skeletal posture feature can effectively express the movement of the human body at a certain moment.The local features of the skeleton can represent the interaction behavior between the human and the object.Multi-time scale skeletal displacement characteristics can improve the problem of large intra-class movement differences.Finally,the three features are fused to obtain the final skeletal feature representation.3.At present,the human action recognition algorithm is mostly based on the single feature.In this thesis,the extracted depth image features and skeleton features are merged to obtain fusion features.The fusion feature makes full use of the advantages of deep image features with rich three-dimensional structure information and the insensitivity of skeletal features to complex motion backgrounds,and realizes the complementary advantages of the two data types.The experimental results show that the accuracy of fusion feature is better than that of single feature.4.Aiming at the problem of long training time of traditional classification algorithm SVM(Support Vector Machine),this paper proposes using ELM(Extreme Learning Machine)algorithm to classify features.ELM algorithm can effectively shorten training time.The recognition accuracy is not reduced compared to the SVM algorithm.Finally,the algorithm proposed in this thesis is verified on the MSR Action3 D action recognition dataset.The experimental results show that the human action recognition algorithm based on RGB-D video is better than the traditional human action recognition algorithm.
Keywords/Search Tags:Kinect, Human action recognition, Multimodal feature extraction, Feature fusion, ELM
PDF Full Text Request
Related items