Font Size: a A A

Human-object Interaction Activity Recognition Based On Deep Learning Mechanism

Posted on:2016-01-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:L BaiFull Text:PDF
GTID:1108330503953428Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Human-object interaction activity recognition is the most challenging task in image understanding. It is also the core of improving the performance of image understanding. Automatively recognizing human-object interaction activity is useful for applications such as content-based image retrieval, image collection, human-computer interaction, and security automation. In this dissertation, we analyze the deep hierarchical structure and the deep learning mechanism of human cortex, and summarize its fundamental procedure, framework, and core functions in recognizing the human-object interaction activity, and then take these as the research reference to facilitate automatically recognizing human-object interaction activity. The main research contents and innovations of this dissertation include the following five parts:(1)Taking the deep hierarchical structure and the deep learning mechanism of cortex as the research reference, this dissertation designs a novel framework of human-object interaction activity recognition. According to the four core sub-tasks of the deep learning process of human cortex in recognizing the human-object interaction activity, the framework designs four corresponding sub-models to achieve the recognition of human-object interaction activity tegother. Along the hierarchy of the framework, the four models achieve 3D object locations reconstruction from a single monocular image, visual composites detection of an image, human-object interaction activity recognition, and image caption generation, respectively.(2) According to the imaging rules of the perspective projection of 3D object locations, we propose a novel model to reconstruct 3D object locations from a single monocular image. The model uses the abstractly discrete analysis method to reconstruct the depth information of the special area where the depth locations change continuously and consistently. And then the model uses the depth locations to facilitate the reconstruction of 3D locations of objects. In the experiments, the model outperforms state-of-the-arts in absolute depth locations reconstruction of objects, relative depth locations reconstruction of objects and the real size estimation of objects.(3) We propose a novel image visual composite detection model by modeling 3D spatial relationship between human and objects. The model predicts the visual composites by estimating the strength of co-occurrence interaction between human and specific objects with each visual composite. Compared with Visual Phrase model, Mutual Model and Model of Group of Objects, our model has better performance in the analysis of 3D spatial relationship between human and objects of various visual composite classes and the detection accuracy of visual composite.(4) Taking the approaches and procedures of the cortex PC area in recognizing human-object interaction activity as the research reference, we propose a novel human-object interaction activity recognition model. According to the deep hierarchical architecture of cortex, the model designs a novel factored three-way interaction machine to model 3D spatial relationship between human and objects as a priori condition, which can facilitate the extraction of the high-level invariant features of human-object interaction activity, and uses the typical fast training algorithm of deep learning to train the model parameters. The proposed model significantly improves precision rate of human-object interaction activity recognition.(5) The dissertation designs a novel image caption generation model that includes two parts: the main semantic relationship prediction of an image and the description generation of an image. The former firstly discovers the most relevant scene stuff corresponding to the specific human-object interaction activity by analyzing the strength of the co-occurrence between the scene stuff and the specific human-object interaction activity. And then it predicts the most suitable preposition phrase to describe the logical relationship between the scene stuff and the specific human-object interaction activity. Based on the logical relationship between the scene stuff and the human-object interaction activity, the latter automatically generates syntactically and semantically well-formed descriptions of the image caption by the lexicalization PCFG syntactic trees. The model not only correctly describes the human-object interaction activity in images, but also shows better normative grammar and cognitive rationality by generated sentences.
Keywords/Search Tags:Human-object interaction activity, Deep learning mechanism, Deep hierarchical architecture, Image visual composite, 3D spatial layout between human and object
PDF Full Text Request
Related items