Font Size: a A A

Research On Recognition And Generation Methods Based On Incomplete Visual Data

Posted on:2022-09-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Y XuFull Text:PDF
GTID:1488306569459404Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
Incomplete visual data refers to data with incomplete information of the incomplete category information or the visual sample.With the advancement of technology,more and more visual data is collected.However,compared with the professional datasets,the visual data collected in daily life is incomplete,which barriers the its understanding and applications.In this thesis,we aim at the recognition and generation tasks based on the incomplete visual data.For the recognition of incomplete visual data,we wish to recognize the unlabeled visual data based on the well-annotated one.In particular,we aim at the zero-shot action recognition and zeroshot learning in this thesis.For the generation of incomplete visual data,we wish to generate their corresponding completeness in various situations based on the incomplete visual sample.In this aspect,we mainly focus on the multi-view faces synthesis,and the face semantic editing based on the generative models.We summarize our novelty and contributions as follows:1.We propose a visually-connected graph convolutional networks for the transductive zero-shot action recognition.With the explosive growth of action categories,zero-shot action recognition aims to extend a well-trained model to novel/unseen classes.To bridge the large knowledge gap between seen and unseen classes,in this thesis,we visually associate unseen actions with seen categories in a visually-connected graph,and the knowledge is then transferred from the visual features space to semantic space via the Grouped Attention Graph Convolutional Networks(GAGCN).In particular,we extract visual features for all the actions,and a visuallyconnected graph is built to attach seen actions to visually similar unseen categories.Moreover,the proposed grouped attention mechanism exploits the hierarchical knowledge in the graph,so that the GAGCN enables propagating the visual-semantic connections from seen actions to unseen ones.We extensively evaluate the proposed method on three datasets.Experimental results show that the GAGCN outperforms state-of-the-art methods.2.We propose a holistically-associated model for the transductive zero-shot learning.Since there is a domain gap between the seen and the unseen classes,and simply matching the unseen instances using nearest neighbor searching in the embedding space cannot bridge this gap effectively.In this thesis,we propose a Holistically-Associated Model to overcome this obstacle.In particular,the proposed model is designed to combat two fundamental problems of ZSL,the representation learning and label assignment of the unseen classes.The first problem is addressed by proposing an affinity propagation network,which considers holistic pairwise connections of all classes for producing exemplar features of the unseen samples.We cope with the second issue by proposing a progressive clustering module.It iteratively refines unseen clusters so that holistic unseen instance features can be used for a reliable class-wise label assignment.Thanks to the precise exemplar features and class-wise label assignment,our model eliminates the domain gap effectively.We extensively evaluate the proposed model on five human action and image datasets.Experimental results show that the proposed model outperforms state-of-the-art methods on these substantially different datasets.3.We propose a face flow guided generative adversarial network that synthesis multi-view faces progressively.We combat the large-angle face synthesis problem by dividing it into a series of easy small-angle rotations,and each of them is guided by a face flow to maintain faithful facial details.In particular,we propose a Face Flow-guided Generative Adversarial Network(FFlow GAN)that is specifically trained for small-angle synthesis.The proposed network consists of two modules,a face flow module that aims to compute a dense correspondence between the input and target faces.It provides strong guidance to the second module,face synthesis module,for emphasizing salient facial texture.We apply FFlow GAN multiple times to progressively synthesize different views,and therefore facial features can be propagated to the target view from the very beginning.All these multiple executions are cascaded and trained end-toend with a unified back-propagation,and thus we ensure each intermediate step contributes to the final result.Extensive experiments demonstrate the proposed divide-and-conquer strategy is effective,and our method outperforms the state-of-the-art on four benchmark datasets qualitatively and quantitatively.4.We propose a new generative adversarial network inversion method based on consecutive images for semantic edit of faces.Existing generative adversarial network inversion methods are stuck in a paradox that the inverted codes can either achieve high-fidelity reconstruction,or retain the editing capability.Having only one of them clearly cannot realize real image editing.In this thesis,we resolve this paradox by introducing consecutive images into the inversion process.The rationale behind our solution is that the continuity of consecutive images leads to inherent editable directions.This inborn property is used for two unique purposes: 1)regularizing the joint inversion process,such that each of the inverted code is semantically accessible from one of the other and fastened in a editable domain;2)enforcing inter-image coherence,such that the fidelity of each inverted code can be maximized with the complement of other images.Extensive experiments demonstrate that our alternative significantly outperforms state-of-theart methods in terms of reconstruction fidelity and editability on both the real image dataset and synthesis dataset.Furthermore,our method provides the first support of video-based GAN inversion,and an interesting application of unsupervised semantic transfer from consecutive images.
Keywords/Search Tags:Incomplete visual data, Zero-Shot Learning, Zero-Shot Action Recognition, Face Synthesis and Recognition, Face Editing, GAN Inversion
PDF Full Text Request
Related items