Human Parsing With Dynamic Co-attention Network

Posted on:2022-05-10

Degree:Master

Type:Thesis

Country:China

Candidate:S C Li

Full Text:PDF

GTID:2518306524976489

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Human parsing aims at identifying body parts and clothing items from human images at pixel level.Current works are based on semantic segmentation methods and introduce auxiliary information related to human body.Noticing that when people post their photos on social network,they often share multiple pictures taken in different poses or viewpoints in the same suit.These extra pictures can naturally be used as auxiliary information.Inspired by this,raw image of other pose is provided as a reference to help parse the current image through looking for the correspondence between two images.This thesis proposes a dynamic co-attention network(DCANet)to solve human parsing task.DCANet consists of a backbone for extracting features,a dynamic attention module for refining current feature and introducing auxiliary information,a decoder module for supplementing global and spatial information and increasing feature resolution.Among them,dynamic attention module is the most important part for refining feature and it can be divided into three parts.(i)Dynamic Filter Module(DFM),which extracts dynamic weight and affinity from the current feature to improve the stucture-related characteristics.(ii)Dynamic Co-Attention Module(DCAM),which finds the most related regions between the paired images,and extracts auxiliary information from the reference feature to improve the current feature.(iii)Dynamic Calculation Module(DCM),which applies dynamic weight and affinity on the input feature and get the output feature of this module.In this way the refined feature can be obtained by extracting stucture information from the current feature and auxiliary information from the reference feature.By fusing these features with the original current feature,the network gets the final refined feature with auxiliary information,which combines the rich information extracted from two images and has stronger representive for the current image and is more beneficial to parse.To show the effect of DCANet,this thesis collects a new human parsing(HP)dataset which consists of paired images from the same person in the same suit but variant poses or viewpoints,and only one of the paired images has its ground truth.Extensive experiments are conducted on HP dataset and the results show both the effect of each module and superiority of DCANet when comparing with other state of the art human parsing methods.

Keywords/Search Tags:

Human parsing, semantic segmentation, co-attention

PDF Full Text Request

Related items

1	Human Parsing Based On Deep Learning
2	Accurate Human Parsing Based On Deep Convolutional Network
3	The Research Of Contextual Information Fusion Algorithms In Semantic Segmentation
4	Research On Road Semantic Segmentation Method Based On Memory And Attention Mechanism
5	Research On Human Semantic Segmentation Based On Deep Learning
6	Grammar Constrained Double-Layer Encoder Decoder For Neural Semantic Parsing
7	Research And Application Of Image Semantic Segmentation Based On Deep Fully Convolutional Networks
8	Research On Generating 3D Human Avatar With Texture Map Based On Images
9	Joint Models For Chinese Morphological Syntactic And Semantic Parsing
10	Research On Semantic Parsing Methods Of Natural Language Interaction For Service Robot