Font Size: a A A

Automatic Graph Representation Learning Theory And Its Application In Image Processing

Posted on:2024-04-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:1528307340974399Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Based on research in cognitive science,brain science,and neuroscience,there is a growing recognition that the behavior of complex systems,whether human society or the brain,is determined by the interactions between its constituent elements.Computer vision is an important research field of artificial intelligence,which performs pattern recognition and intelligent control by extracting valuable information from images.For advanced intelligent perception and image interpretation,geometric representation and relationship modeling of image content is an important and challenging task.Graph data is a universal language for describing complex systems and provides an interface to directly generate structured knowledge and relational reasoning.The advantage of graph structures is that graph nodes can independently represent the attributes of objects,and graphs can flexibly establish relationships between objects.In image recognition tasks,objects and scenes are highly structured.Therefore,graph perspective modeling of images can provide greater flexibility,is an effective method to achieve relationship modeling and reasoning,and is also a crossing field worthy of researchers’ attention.The focus of this intersection is how to design graph learning models suitable for image representation,rather than developing message passing methods on graphs.In this dissertation,we make full use of the flexibility and effectiveness of graph modeling to design a variety of depth graph vision models,as follows:(1)For existing graph convolutional network(GCN)-based image representation methods relying on manually constructing and updating graph structures,an automatic graph-learning convolutional network is designed to unify the graph learning and hyperspectral image classification task in a‘network-in-network’approach.The method employs graph structures to model the interaction between higher-order tensors in images.Considering the powerful learning and representation capabilities of convolutional neural networks,the method embeds a semi-supervised two-branch network into the GCN to accomplish automatic learning and updating of graphs.GCN further encodes and infers the dynamic graphs,and then the learnable graph reprojection matrix is applied to remap the graph representations back to the image grid features.The automaticity of the method is not only reflected in the fact that the graph representation learning is designed and updated by an end-to-end network implementation,but also is hyperspectral image classification task-driven.(2)For existing image processing methods that ignore multi-scale modeling in the graph domain,a hierarchical dynamic graph clustering network for visual feature learning is proposed.The method mines images from coarse to fine graph representations in the graph domain in an adaptive,data-adaptive and task-adaptive manner.In forward propagation,an adaptive clustering network is designed to learn the latent class separation a priori knowledge of images and generate cluster-based coarsened graphs.GCN is used to diffuse,transform and aggregate information among clusters.The coarsened graph representations are then mapped to grid features based on their affinity to linear projection features.To further improve the task adaptation of clustered clusters and multi-scale graph representations,the clustering network and GCN are jointly trained.In backpropagation,the gradient descent algorithm is used to dynamically adapt the initial graph features and clustering clusters to generate task-oriented optimal clustering spaces.(3)Drawing on the recognition mechanism of the human visual system,an interpretable multi-resolution contourlet network is proposed for image classification.The approach skillfully balances graph representation learning with the multi-scale and multi-directional features of images,and provides interpretable theoretical support for optimizing the model structure.Specifically,contourlet are used to model the singularity of an image to capture the multi-resolution,multi-directional,nonlinear and sparse geometric features of the image.The method is constructed to build superpixel-based region graphs,and then multi-resolution contourlet coefficients are encoded into the graph structure for graph representation learning.Considering the statistical properties of the contourlet coefficients,the Mahalanobis distance is used to compute the adjacency matrices of the nodes,while the GCN further learns a more abstract multi-level contourlet-enhanced graph representation.Finally,the parameterized graph assignment matrix is applied to obtain an associative representation of the image.(4)The enhancement of sample complexity by structural modeling facilitates the model to learn more discriminative features,and to obtain comprehensive and reliable structured information about images,a structure-aware multi-scale graph Transformer model is proposed.The approach first models the multi-scale information of an image from the graph perspective,specifically,GCN is used to extract multi-scale graph representations in parallel,and then the multi-branch graph representations are iteratively cross-fused across resolutions.Since GCN focuses on information transfer,aggregation and transformation between nodes,this ignores the structural information of the graph domain.In Transformer,the attention mechanism is used to handle interactions between sequences without introducing any structural inductive bias.Therefore,to produce more expressive graph representations,we explicitly model structural information between nodes using the graph structure-aware attention mechanism,where r-hop subgraphs are used to obtain vector representations of the local substructures of the graph,followed by the attention mechanism to compute similarities between local graph structure vectors to reassign attention weights.(5)For the small-sample classification problem,a heterogeneous Riemannian manifold metric learning network is designed.Category representation and similarity metrics are key issues in small sample learning.Human brain perception is based on cognitive nonlinear manifolds,and the Riemannian manifolds are better at modeling images than linear modeling in Euclidean space,so we map image features to three heterogeneous and complementary manifold spaces.To facilitate the measurement of inter-sample distances,the method utilizes the Riemannian kernel function to map the heterogeneous manifolds into high-dimensional reproducing Hilbert spaces,at the same time we compute the intra-class and inter-class distances of the image in each Hilbert space.The kernel trick is then used to obtain the aggregation distances in the low-dimensional subspace.Specifically,we use a neural network to approximate the optimal aggregation subspace of the hybrid manifolds,and orthogonalization constraints are applied to the output of the neural network to solve for the eigenvectors of the subspace.Finally,the difference between the intra-class distance and inter-class distance is used as the optimization objective of the distance metric.The proposed method can be trained end-to-end,so the learned distance metric can be better generalized to unseen data.
Keywords/Search Tags:Graph Representation Learning, Deep Learning, Graph Convolution, Graph Vision Model, Image Representation, Manifold Learning
PDF Full Text Request
Related items