Font Size: a A A

Research On Unsupervised Representation Learning Methods And Applications For Complex Data

Posted on:2022-08-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:S L XuFull Text:PDF
GTID:1488306341986209Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,with the development of information technology,a large number of unlabeled data has been generated in practice.It brings great challenges to mine valuable knowledge and patterns from unlabeled data due to the unlabeled characteristic,the diversity of data types,and the high cost of marking data.Up to now,researchers have proposed many unsupervised representation learning and knowledge discovery methods.However,most of them can only deal with a single type of data and represent data from a single view,and cannot implement online learning.It is also difficult for them to deal with unstructured graph data.In this dissertation,unsupervised representation learning and knowledge discovery for complex data are discussed and a series of representation learning methods are proposed for numerical data,categorical data,streaming data and unstructured graph data.Furthermore,they have achieved good performance on clustering analysis,concept drift detection and community discovery.The main contents and innovations of this dissertation are summarized as follows:(1)For the representation learning and clustering analysis of nonlinear data,the diversity of data types brings great challenges.Ensemble clustering is an effective algorithm.However,the result of ensemble clustering is often the mean result of multiple clustering algorithms.Therefore it is often unable to achieve "weak becoming strong".Multi-feature fusion soft subspace clustering algorithm proposed in this dissertation uses different linear or non-linear reduction algorithms to reduce the dimensionality of data and generates multi-view features.Finally,multiple weak clusters are merged to form a strong cluster by weighting mechanism.Aiming at the problem that there is no geometric structure for categorical data which results in that it cannot effectively conduct attribute reduction,a fuzzy rough clustering algorithm for categorical data is proposed in this dissertation.The similarity of samples in the same equivalence class and the dissimilarity of samples in different equivalence classes are considered.Categorical data is converted into numerical data.The experimental results prove the effectiveness of the proposed algorithm.For concept drift detection of data stream,the existing data stream clustering algorithms can only detect either abrupt or gradual concept drift,and cannot detect two types of concept drifts in the same algorithm.An adaptive clustering algorithm is proposed for data stream with concept drift in this dissertation.The incremental learning and sliding window mechanism are introduced and the features of each data block are transformed into numerical type.Concept drift can be detected by threshold partition.The experimental results prove that the proposed algorithm can deal with data stream clustering with both abrupt and gradual concept drift.(2)Traditional graph embedding algorithms often directly measure the similarity of nodes from the first-order neighborhood relationship.However,the first-order neighborhood relationship can only reflect the local relationship of nodes,and cannot measure the similarity of nodes from the global structure.A neighborhood graph embedding algorithm that fuses the first-order and second-order neighborhood relationships is proposed in this dissertation.The two neighborhood relationships are fused to define a fuzzy membership degree to reflect the membership between nodes to form a new matrix.The embedding vectors of nodes can be obtained from the matrix,therefore the algorithm can reflect the different relationships of nodes under different granules.In traditional graph embedding algorithms,they are lack of feedback mechanism.Aiming at the problem,the influence of feedback mechanism on graph embedding result is further researched,and a manifold graph embedding algorithm with information propagation mechanism is proposed in this dissertation.Multi-hop connection is used to obtain the high-order information of graph.Then manifold learning and low rank learning are introduced to obtain the low dimensional embedding vectors of nodes.Finally,structure information is used to adjust the embedding result.The experimental results show that the proposed algorithm can not only fuse the local and global structure information of graph but also have better robustness.(3)Traditional graph embedding algorithms which are shallow models can only obtain the low-level semantic features of graph,and cannot deal with attribute graph.As an effective graph representation learning method,graph neural network can not only extract high-level semantic features but also effectively fuse attribute information and structure information.However,most graph neural networks cannot fuse different high-order proximity information of graph and they also cannot selectively focus on the features that are beneficial to a task.Aiming at the above problems,a self-supervised deep graph embedding algorithm that can fuse different high-order proximity information is proposed this dissertation and applied to community discovery.Different high-order proximity information matrices are input into multiple graph neural networks to obtain multiple groups of high-level semantic features.Then a weighted summation or concatenation method is adopted to fuse the features.Finally,neural networks are trained by contrastive learning and negative sampling mechanism.After obtaining the low dimensional embedding vectors of nodes,spectral propagation is introduced to further enhance the embedding result.The experimental results show that the proposed algorithm can not only effectively fuse different high-order proximity information but also obtain a better community structure than the current mainstream algorithms.
Keywords/Search Tags:Representation Learning, Clustering Analysis, Community Discovery, Self-Supervised Learning, Graph Neural Network
PDF Full Text Request
Related items