Font Size: a A A

Human Interaction Recognition Based On The Fusion Of RGB And Skeleton Data

Posted on:2020-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:L L QinFull Text:PDF
GTID:2428330605980534Subject:Engineering
Abstract/Summary:PDF Full Text Request
Human interaction recognition based on video is a hot issue in the field of image processing and computer vision.Due to the lack of depth information in the action recognition of RGB video,the accuracy and real-time performance of the recognition results do not meet the practical requirements of relevant industries.In recent years,Microsoft's Kinect device can directly acquire skeleton data with depth information,which provides effective information supplement for RGB video.At present,the deep learning network can directly extract the deep information in the image,which greatly improves the accuracy of action recognition.Therefore,this paper research the convolutional neural network structure model based on the fusion of RGB and skeleton data.First of all,the skeleton data needs to be coded and imaged before it can be combined with CNN for recognition.However,the current coding process does not consider the relationship between the spatial position of joint points and the interaction of two people very well.Therefore,the distance feature of joint points is introduced to code the skeleton data.Then the coded image is sent to CNN to learn the deep features and to realize the recognition of two-person interaction.The algorithm has strong operability and good real-time performance.Secondly,the loss of spatial and temporal information is serious in the existing research on the process of joint information coding process.Therefore,this paper proposes an innovative coding method of joint motion map that contains temporal and spatial information,and combines with CNN network to get good recognition results.This method doesn't need complex preprocessing process for joint points and achieve real-time processing.Finally,according to the respective characteristics of RGB video and joint point data,a CNN structure model is proposed which based on RGB and joint point data dual-stream information fusion.The former uses keyframes to obtain spatiotemporal images,the latter is code to construct joint point motion images.Two kinds of feature maps are sent to CNN network to get recognition score,and finally is achieved by fusing the recognition scores of two feature maps.The method is simple and easy to implement with good mobility.A two-person interactive behavior recognition framework based on multi-source information fusion is successfully established.
Keywords/Search Tags:Human Interaction Recognition, RGB Video, Skeleton Data, Deep Learning, Information Fusion
PDF Full Text Request
Related items