Font Size: a A A

Generative Adversarial Network Based Human Target Understanding And Analysis

Posted on:2021-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:J L TangFull Text:PDF
GTID:2428330614968289Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Understanding and analyzing human targets is a core function of the intelligent surveillance video(image)processing systems,which has urgent practical needs and broad application prospects in security and other fields.Meanwhile,it is also one of the most popular research directions in computer vision.Aiming at human targets in videos or images,this thesis conducts research on the tasks of crowd counting and human action prediction in computer vison,from the perspectives of the entire crowd and the individual person,respectively.The main work and contributions of this thesis are as follows:1.In general,this thesis uniformly applies the generative adversarial network as the overall framework to tackle the problem of high-quality image generation which is involved in the tasks of both crowd counting and human action prediction.Specifically,based on the generative adversarial network,this thesis designs the model structures corresponding to different task requirements to generate the density maps with sharp details and the predicted frames with realistic appearance,respectively.2.For the task of crowd counting,this thesis proposes a generative adversarial network based method for generating high-quality crowd density maps with precise spatial location of the crowd.Specifically,relying on the feature pyramid network backbone,this thesis designs a generator model to fuse low-level features with rich spatial location information about the crowd and high-level features with abundant semantic information about the crowd sufficiently via lateral connections between the bottom-up and top-down pathway of the feature pyramid network,equipping the model with both semantic and spatial perception of the crowd.Besides,this thesis further introduces the spatial and channel attention mechanisms to achieve selective feature extraction and feature fusion.3.For the task of human action prediction,this thesis proposes a generative adversarial network based method for appearance-preserving human action prediction under the guidance of human pose.First,this thesis utilizes a LSTM based pose prediction network to predict the coordinates of human landmarks in future frames.Then,relying on the human pose information represented by the coordinates of human landmarks,this thesis designs a single-frame generator which combines a global generative adversarial network and a local generative adversarial network to capture different levels of appearance features and reconstruct human appearance in a coarse-to-fine,global-to-local way,generating the single frame with high quality.Finally,this thesis designs a 3D encoder-decoder architecture based video refinement network to further improve the whole quality of the generated videos.
Keywords/Search Tags:Human Target Understanding and Analysis, Crowd Counting, Human Action Prediction, Generative Adversarial Network
PDF Full Text Request
Related items