Font Size: a A A

Research On The Representation Learning Of Complex Heterogeneous Data

Posted on:2020-07-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:S L JianFull Text:PDF
GTID:1368330611992994Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the coming of the age of artificial intelligence and big data era,various complex heterogeneous data emerge which are the basis of data-driven artificial intelligence methods and machine learning algorithm.Therefore,it has been a critical challenge to representing useful complex heterogeneous data for machine learning.We analysis several typical complex heterogeneous data and propose multiple novel representation learning methods and models based on the intrinsic data characteristics and complexity.(1)Representation learning framework and instantiated algorithms for categorical data.The representation of categorical data with hierarchical value coupling relationships(i.e.,various value-to-value cluster interactions)is very critical yet challenging for capturing complex data characteristics in learning tasks.We propose a novel and flexible coupled unsupervised categorical data representation(CURE)framework,which not only captures the hierarchical couplings but is also flexible enough to be instantiated for contrastive learning tasks.CURE first learns the value clusters of different granularities based on multiple value coupling functions and then learns the value representation from the couplings between the obtained value clusters.CURE is instantiated into two models: coupled data embedding(CDE)for clustering and coupled outlier scoring of high-dimensional data(COSH)for outlier detection.These show that CURE is flexible for value clustering and coupling learning between value clusters for different learning tasks.CDE embeds categorical data into a new space in which features are independent and semantics are rich.COSH represents data w.r.t.an outlying vector to capture complex outlying behaviors of objects in high-dimensional data.Substantial experiments show that CDE significantly outperforms three popular unsupervised encoding methods and three state-of-the-art similarity measures,and COSH performs significantly better than five state-of-the-art outlier detection methods on high-dimensional data.(2)Representation learning model for mixed data.Mixed data with both categorical and continuous features are ubiquitous in real-world applications,which is a typical type of heterogeneous structural data.Learning a good representation of mixed data is critical yet challenging for further learning tasks.Existing methods for representing mixed data often overlook the heterogeneous coupling relationships between categorical and continuous features as well as the discrimination between objects.To address these issues,we propose an auto-instructive representation learning scheme to enable mutual learning between two encoding spaces for an unsupervised discrimination-enhanced representation learning.Accordingly,we design a metric-based auto-instructor(MAI)model which consists of two collaborative instructors.Each instructor captures the feature-level couplings in mixed data with fully connected networks,and guides the infinite-margin metric learning for the peer instructor with a contrastive order.By feeding the learned representation into both partition-based and density-based clustering methods,our experiments on eight UCI datasets show highly significant learning performance improvement and much more distinguishable visualization outcomes over the baseline methods.(3)Representation learning model for attributed network.Attributed network contains both complex network and node content information,which is a mixture of relational data and non-relational data.The formation of a complex network is highly driven by multi-aspect node influences and interactions,reflected on network structures and the content embodied in network nodes.Limited work has jointly modeled all these aspects,which typically focuses on topological structures but overlooks the heterogeneous interactions behind node linkage and contributions of node content to the interactive heterogeneities.Here,we propose a multi-aspect interaction and influence-unified evolutionary coupled system(MAI-ECS)for network representation by involving node content and linkage-based network structure.MAI-ECS jointly and iteratively learns two systems: a multi-aspect interaction learning system to capture heterogeneous hidden interactions between nodes and an influence propagation system to capture multi-aspect node influences and their propagation between nodes.MAI-ECS couples,unifies and optimizes the two systems toward an effective representation of explicit node content and network structure,and implicit node interactions and influences.MAI-ECS shows superior performance in node classification and link prediction in comparison with the state-of-the-art methods on two real-world datasets.Further,we demonstrate the semantic interpretability of the results generated by MAI-ECS.(4)Representation learning for cross-domain and multimodal data.Cross-domain data and multimodal data are two normal types of complex heterogeneous data which may contain distribution heterogeneity,structure heterogeneity,and modality heterogeneity.In cognition and educational theory,empathy is an important mechanism for understanding and learning others' thoughts and knowledge to improve one's self-thoughts.We regard empathy mechanisms very valuable for cross-domain and multimodal data representation learning,an area with highly demanding applications yet critical challenges to learning tasks including transfer learning,domain adaptation,and multimodal learning.Accordingly,a new cross-domain learning mechanism: empathy machine(EPM)capable of source perspective-taking and target self-reflection,is proposed to mimic the empathy in human learning.EPM generates two representations for the target domain: a perspective-taking representation(PTR)of target knowledge compatible with the source and a self-reflection representation(SRR)of intrinsic and PTR-complementary target knowledge to fuse the consensus and complementarity between heterogeneous domains.EPM is instantiated to a domain adaptation model(EPM-DA)and a multimodal learning model(EPM-MML).EPM-DA and EM-MML are applied in semi-supervised visual domain adaptation and image-text cross-modal retrieval tasks respectively and they achieve significant improvement compared with the state-of-the-art methods.
Keywords/Search Tags:representation learning, machine learning, categorical data, mixed data, attributed network, cross-domain learning, multimodal learning, deep leanring
PDF Full Text Request
Related items