Font Size: a A A

Part-based Representation Learning For Fine-grained Image Recognition And Generation

Posted on:2023-11-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:H L ZhengFull Text:PDF
GTID:1528306902959619Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of deep learning,artificial intelligence technology has been widely integrated into people’s daily life.For example,people can search for a product online by simply taking a picture of it,and they can pay for it by scanning faces.Old photos can be brought back to life,and everyone could be a movie star with video editing tools.AI-empowered applications can be seen everywhere.The success of these applications can be attributed to a powerful image representation.Since objects are composed of parts,part-based image representation is a fundamental representation method that fits the inherent properties of objects.However,it is challenging to obtain semantic parts:for supervised part learning methods,the scalability is limited due to the huge cost of labeling parts.For weakly-supervised part learning methods,it is difficult to obtain fine-grained and semantically consistent(i.e.,the localized parts in different samples share the same semantic)parts.To solve these problems,this dissertation investigates the learning and application of part-based image representation from three aspects.Firstly,this dissertation studies how to improve the performance of weakly-supervised part learning methods that explicitly predict part positions.Secondly,to address the problem of the complex training process and huge computational cost in explicit part learning,a more efficient implicit part learning method is designed,where part information can be well leveraged in the feature learning process of convolutional neural networks.Finally,the implicit part learning methods are extended to image generation tasks,showing the generalization of part-based image representation.In terms of explicit part learning,this dissertation studies how to precisely localize small and semantically consistent parts and proposes a multi-attention network as well as a progressive attention network.Specifically,existing approaches predominantly solve the challenges of part localization and feature learning independently,while neglecting the fact that they are mutually correlated.This dissertation proposes a weakly supervised part learning method based on multi-attention mechanisms.The proposed multi-attention network consists of a channel grouping layer and a corresponding loss function that enables the model to be trained end-to-end,thus part localization and feature learning can mutually reinforce each other.The underlying image structure is mined by using both image-level labels and spatial priors of object parts,enabling the proposed model to localize parts stably and consistently,which significantly improves finegrained image recognition accuracy.Moreover,in order to pinpoint fine-grained parts,a progressive attention model is proposed in this dissertation.In this model,an attention rectification mechanism is well-designed for small parts localization and achieves precise localization in a localization-and-rectification manner.The proposed progressive attention model can be used recurrently to obtain hierarchical parts,which can leverage multi-scale fine-grained part features and further boost the recognition accuracy.In terms of explicit part learning,this dissertation studies how to efficiently integrate part-based representation into the process of feature learning and proposes an attention sampling network and a deep bilinear network.Specifically,to address the problem of too many hyperparameters,complex training processes,and huge computational costs in explicit part learning,this dissertation proposes a trilinear attention sampling network.This model consists of three components,i.e.,a trilinear attention mechanism,which enhances each convolutional channel into an attention map by modeling the correlation of the convolutional channels;an attention sampling module,which can transfer the part information into re-sampled images;and a knowledge distillationbased training strategy,which can efficiently integrate the part information into a single convolutional neural network.Moreover,in order to design a more general backbone that incorporates part-based image representation into the feature extraction of each layer of the network,this dissertation proposes a deep bilinear network.This network combines part-based image representation and high-order feature expression.By designing a grouped bilinear module with semantic grouping constraints,the semantic expression ability of implicit part features is enhanced,and the dimension of high-order features is reduced so that they can be efficiently integrated into each block of a deep network.In both these two works,the proposed method effectively improves model capacities and achieves significant improvements in fine-grained image recognition tasks.In terms of generative part learning,this dissertation investigates how to learn and leverage part information in generative adversarial networks and proposes a semanticaware generator.Specifically,this study is based on the research of implicit part learning.It is shown that by semantically disentangling the latent space of the generative network,the mapping from the latent space to the image space can be effectively simplified,which can help synthesize high-fidelity images and achieve semantic-specific control.The semantic disentangle module and semantic fusion module designed in this work can effectively extract and utilize semantic parts in generative networks,leading to a semantic controllable image generation model.
Keywords/Search Tags:Part-based Image Representation, Fine-grained Image Recognition, Image Generation, Weakly-supervised Learning, Attention Model
PDF Full Text Request
Related items