Font Size: a A A

Research On 3D Data Based Human Action Recognition

Posted on:2020-09-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:C K LiFull Text:PDF
GTID:1488306518457154Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Human action recognition is a hot research in computer vision and artificial intelligence.It has wide applications,such as the next generation smart home,self-service stores,intelligent video monitoring,interactive entertainment.The previous research of action recognition is based on RGB video and the accuracy has not been satisfactory due to problems such as view-point,lighting changes and complex background.With the development of depth sensors,3D data like depth,skeleton has been obtained more easily.Compared with RGB data,3D data can provide the structure information of 3D scene and have strong robustness to scale and illumination.Therefore,currently,action recognition based on 3D data is a hot topic.In recent years,deep learning has achieved excellent performance in many computer vision tasks.It is an important research to apply deep learning into action recognition.This thesis focuses on them and studies how to extract effective spatio-temporal features from 3D data by deep network.The works accomplished are as follows:1.An effective method is proposed to encode the spatio-temporal information ofskeleton sequence into texture image.The pair-wise joint distances for eachframe is calculated and these distances are arranged in one column to generatean image referred to as joint distance maps(JDMs)and CNNs are employed toexploit the discriminative features from the JDMs for human action.Comparedwith the joint trajectory map(JTM),JDMs can better solve the sensitive problemof view change.Compared with the previous best method,the proposed methodimproves by 4.95% and 2.30% on the NTU RGB+D and UTD-MHAD datasetsrespectively.2.An action recognition method based on multi-stream networks is proposed.Three CNNs and three RNNs are used to extract different features which ef-fectively takes the advantages of LSTM and CNN models in exploiting tempo-ral information and mining spatial information respectively.Specifically,threeviews are constructed in the spatial domain and fed to(LSTM)networks andthree views are constructed using the improved joint trajectory maps(IJTM)and fed to three CNNs.Decision fusion is used to combine the recognition s-cores of all views.Compared with the previous best scheme,proposed methodcan improve 1.65%,7.48% and 3.35% on NTU RGB+D,UTD-MHAD,andMSRC-12 Kinect Gesture datasets respectively.3.An action recognition framework based on spatio-temporal attention is pro-posed.The joints of each frame are mapped into an image,and the local andglobal spatio-temporal features are extracted by a 3D Conv LSTM.A convolu-tional network is designed to focus on the vital spatial region at each time,andthe key frames are selected by using the information at all times to generate anattentive dynamic map(ADM)by temporal pooling.The ADM intends not on-ly to capture the dynamic information of human motion,but also to embed in itthe spatio-temporal attention for the classification of the action.The efficacy ofproposed method has been verified on three datasets.Compared with the pre-vious attention-based method,The average improvement is 8.29% on the NTURGB+D dataset.4.A deep fusion network(DFN)is proposed.3D Dense Net is designed to extractfeatures from RGB or depth map,and Ind RNN is used to extract features fromskeleton sequence.The two features are fused fully in DFN by using the kro-necker product,and then the correlation between the two features is eliminatedby using 1D CNN.To solve the problem of losing modal,a retrieval estima-tion model is designed to replace the lost modal with a useful feature learnedfrom other modal.The proposed method can improve 1.90%,1.53% and 7.61%on NTU RGB+D,UTD-MHAD and SYSU-3D datasets respectively,comparedwith the current best fusion methods.
Keywords/Search Tags:3D data, Attention, Convolutional Neural Networks, Recurrent Neural Networks, Action recognition
PDF Full Text Request
Related items