Font Size: a A A

Research On Representation Of Human Activities Based On The Context Of Object Affordances

Posted on:2018-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:G J HuangFull Text:PDF
GTID:2348330512990710Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
The goal of visual understanding of the environment and human activities is to make the visual system possess human visual perception and reasoning ability,which is the highest goal of computer vision research.However,due to the movement of non-rigid and high degree of freedom and other factors,the research progress in perception of human activities and object in the dynamic environment is relatively slow.Inspired by the relative research in neuroscience,cognitive science and psychology,we construct the representation of the mutual context of objects and activities through mining the inner relations of objects and human activities in the dynamic environment,and eventually understand human activities using robot vision.The main work of this paper is as follows:Firstly,in order to narrow the semantic gap,the use of visual attributes sets up a semantic bridge between the low-level features and high-level semantics and maps image input to high-level semantic output.We study the object visual attributes as middle presentation layer of models,and prove that such models have good ability in transfer learning.We also study visual attributes classification algorithms based on combinational features and on unsupervised feature learning.The experiment shows that the visual attribute classification method based on unsupervised feature learning has better accuracy.Secondly,reasoning about objects and their affordance is a fundamental problem of visual intelligence.Most of the previous work simplifies the problem of reasoning into classification problem.We construct an object network pattern model based on Markov logical network,use reasoning method to solve the reasoning problem of objects affordances.The using of object visual attributes as the model's middle presentation layer enhances presentation ability to objects with rare training samples or zero training sample.We combine human pose and other meta-data sources,finish rich reasoning problem in a unified framework.Thirdly,the trajectory information of objects interacting with human plays a vital role in representing a wealth of contextual information in human activities.It's difficult to acquire accurate trajectory in the human action videos which are collected in the dynamic family environment.Focusing on this problem,the scale feature of depth image information is structured and an object detection-tracking fusion algorithm is proposed.This algorithm enjoys a high accuracy and lays the foundation for the future work.Finally,to represent the human action and its contextual information,a human action model is firstly established based on Spatial Temporal Markov Field(S-TMRF).Then,the spatial and temporal features which can describe human action,objects and their relationship are structured completely and an energy function based on S-TMRF is constructed to represent the relationship.Finally,combining the representation models of object network pattern and human action,the human behavior reasoning framework is build based on S-TMRF and the Markov logic network reasoning mechanism to understand the human behavior.
Keywords/Search Tags:Object Affordance, Human activity recognition, Object detection and tracking, Markov fandom field
PDF Full Text Request
Related items