Font Size: a A A

Integrating Knowledge For Human Parsing And Understanding

Posted on:2024-09-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H WangFull Text:PDF
GTID:1528307079952219Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Human society is naturally dominated by visual information,and aided by other information such as speech and sound.The theory of computer vision has become a key to advance in artificial intelligence(AI).Current research on computer vision technology involves perception,cognition,and reasoning.Among them,perception is the core foundation toward cognition and reasoning.Currently,vision-based object recognition,object detection,and other general object perception technologies have made important progress in the past decade.Notably,the ultimate goal of AI technology is to serve human beings.As an important foundation of the human-centered vision computing theory,research on human parsing aims to explore human-related vision intelligence technologies,which are able to accurately identify the basic characteristics of the human body,such as posture,shape,and other properties,from a visual media such as images or videos.This research not only promotes the visual understanding of human beings by machines,but also advances contemporary AI applications such as “smart manufacturing”,“smart surveillance”,and “smart retailing”.Human parsing corresponds to the process of human perception,which relies on an efficient visual mechanism.By analyzing the visual characteristics of the human body,the purpose of human understanding can be achieved.Essentially,it needs to solve two basic problems: 1.how to distinguish human bodies in images,and 2.what are the basic characteristics of each human body part? Motivated by the rapid development of deep learning technology,a series of deep neural network based human parsing models have been derived,and various paradigms such as two-stage and single-stage parsing have been developed.However,current human parsing algorithms are still challenged by complex conditions such as crowded scenes,limited data samples,fine-grained reasoning and visual perturbations.Deep neural network based human parsing models have shown to be vulnerable to complex conditions,making it difficult to be extended to applications in real-world scenarios.Therefore,a reliable and high-quality human parsing model is able to solve these two fundamental problems under various conditions.On the other hand,making full use of known knowledge is key to handling those complex conditions.In the computer vision,there are two types of knowledge that is commonly used to enhance deep neural networks.The first one is the implicit knowledge,which is often stored in a neural network by learning a specific vision task.The second one is the association law that is summarized from large scale datasets,involving many prior knowledges such as commonsense knowledge.Hence,this dissertation studies knowledge-integrated human parsing algorithm from four aspects.In summary,four innovative results are achieved:1.To address the problem of model degradation in complex visual scenes,this dissertation proposes a parsing method based on semantic knowledge of categories.Specifically,it is difficult for a parsing model to extract stable yet semantically discriminative pixel representations in complex scenes.Meanwhile,due to the problem of ambiguous label,identical visual content of a body part is with two different semantic labels,which further accelerates the degradation of a parsing model.To handle these challenges,this dissertation designs an adaptive parsing neural network.By constructing semantic representations of body part categories and an adaptive parsing mechanism,the robustness of pixel representations in crowded scenes are enhanced,thus reducing the degradation of a parsing model.2.To address the problem of low-quality training samples,this dissertation proposes to construct multi-task parsing knowledges describing various human characteristic,and designs a knowledge transfer mechanism to improve the parsing ability of a model under limited sample learning.Specifically,a high-precision parsing model relies on highquality labeled data samples.When it comes to insufficient labeled data or the problem of unbalanced categories,a parsing model intends to learn biased parsing knowledge,which makes it difficult to generalize to multiple realistic scenarios.To tackle this,this dissertation divides parsing tasks of human characteristics into two domains,where a task with rich labeled samples is grouped into the source task domain,while a task with limited labeled samples is grouped into the target task domain.The knowledge learned by the model in the source task domain is converted to the one for the target task domain by exploiting the common knowledge among tasks,thus enhancing the parsing ability of a model in the target task domain.3.To address complex reasoning-based parsing task such as attribute parsing,this dissertation explores a general modeling approach based on knowledge integration.Unlike traditional human parsing tasks,the fine-grained attribute parsing task requires a parsing model to parse multiple semantic concepts from limited local visual content.Current human body parsing models are trapped by the limited human visual information,which makes it difficult to handle the variations of body attributes under various scenarios.Leveraging the visual knowledge is the key to break through the limitations of existing methods.Based on this,this dissertation considers both implicit and explicit visual knowledges,and designs an encoder that incorporates implicit knowledge and a decoder that utilizes explicit knowledge of human properties.The proposed method unifies the implicit/explicit knowledge of human body parsing into the neural network.The resulting knowledge-driven approach to fine-grained attribute parsing is implemented and serves as an exemplar for two classical fine-grained human body parsing tasks.4.To address the problem of model degradation under visual perturbation,this dissertation proposes a human body parsing paradigm that incorporates causal properties.Specifically,this dissertation analyzes the human parsing from a causal perspective,and constructs a causal representation learning method based on two causal principles that are summarized in causal learning theory.By guiding a human parsing model to explore the causal factors behind the human parsing process and constructing latent representations that satisfy causal properties,the human parsing model is significantly improved in terms of anti-interference ability.
Keywords/Search Tags:Visual Perception, Human Parsing, Knowledge Transfer, Knowledge Integration, Causal Representation Learning
PDF Full Text Request
Related items