Font Size: a A A

Human Parsing With Dynamic Co-attention Network

Posted on:2022-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:S C LiFull Text:PDF
GTID:2518306524976489Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Human parsing aims at identifying body parts and clothing items from human images at pixel level.Current works are based on semantic segmentation methods and introduce auxiliary information related to human body.Noticing that when people post their photos on social network,they often share multiple pictures taken in different poses or viewpoints in the same suit.These extra pictures can naturally be used as auxiliary information.Inspired by this,raw image of other pose is provided as a reference to help parse the current image through looking for the correspondence between two images.This thesis proposes a dynamic co-attention network(DCANet)to solve human parsing task.DCANet consists of a backbone for extracting features,a dynamic attention module for refining current feature and introducing auxiliary information,a decoder module for supplementing global and spatial information and increasing feature resolution.Among them,dynamic attention module is the most important part for refining feature and it can be divided into three parts.(i)Dynamic Filter Module(DFM),which extracts dynamic weight and affinity from the current feature to improve the stucture-related characteristics.(ii)Dynamic Co-Attention Module(DCAM),which finds the most related regions between the paired images,and extracts auxiliary information from the reference feature to improve the current feature.(iii)Dynamic Calculation Module(DCM),which applies dynamic weight and affinity on the input feature and get the output feature of this module.In this way the refined feature can be obtained by extracting stucture information from the current feature and auxiliary information from the reference feature.By fusing these features with the original current feature,the network gets the final refined feature with auxiliary information,which combines the rich information extracted from two images and has stronger representive for the current image and is more beneficial to parse.To show the effect of DCANet,this thesis collects a new human parsing(HP)dataset which consists of paired images from the same person in the same suit but variant poses or viewpoints,and only one of the paired images has its ground truth.Extensive experiments are conducted on HP dataset and the results show both the effect of each module and superiority of DCANet when comparing with other state of the art human parsing methods.
Keywords/Search Tags:Human parsing, semantic segmentation, co-attention
PDF Full Text Request
Related items