Complex systems in the real world can often be modeled as the form of networks such as social networks,transportation networks,citation networks.Network is everywhere in life,and the analysis of network data is of great significance in all disciplines and applications.In recent years,network representation learning technology has gradually become a research hotspot in the field of network data analysis.Network representation learning is also called network embedding.Its essence is to map a network to a low-dimensional and dense vector representation space,while maximizing the preservation of the topology,attributes and other information of the original network.Network representation learning provides an efficient network representation for big data processing,which turns networked data into vectors,enabling it to be easily applied to various subsequent downstream machine learning tasks,such as node classification,node clustering,and link prediction.However,with the development of society in the era of digital economy,the realworld network gradually presents the characteristics of large-scale and heterogeneity.How to efficiently deal with and analyze such a huge complex network has become an urgent problem to be solved.Despite the advantages of existing network representation learning methods in handling various network analysis tasks,there are still places worth studying deeply.First of all,the network in the real world is a heterogeneous information network composed of different objects and complex relationships.How to explore the potential correlation between network structure and complex semantics while conducting heterogeneous information network representation learning is a difficult problem to be further studied.Secondly,in addition to the topological structure of the network itself,the nodes in the network also have their own unique and rich external attribute information.How to save the network structure into the node representation learning process is the key to the study of attributed network representation.Finally,a common premise that many of the existing representation learning methods follow is that the original network is reliable,yet this reliability is not absolutely guaranteed to a large extent.Therefore,in view of the above three aspects,this thesis focuses on the heterogeneous information network representation learning of preserves network semantics,explores the attributed network representation learning of the potential association between external information and network topology,and self-supervised heterogenous network representation learning based on unreliable original network,develops the research of network representation learning method based on deep learning.The main work completed in this thesis is as follows:(1)To describe the complex relationships between nodes in heterogeneous information network,a multi-view-based representation learning method of a heterogeneous information network is proposed.First,the complex heterogeneous information network is split based on the network semantics obtained by the meta-path,and the multi-view semantic network is constructed.On top of the generated multi-view network,the connection between network structure and semantics is modeling by the proposed single-view network semantic preservation and enhanced view collaboration mechanism.Finally,the attention mechanism is weighted to fuse each view representation to obtain the node representation.(2)A proposed network representation learning method for the potential nonlinear correlation between the external network attribute information and network topology.First,a fusion matrix that preserves network higher-order information is explored by using the network adjacency matrix and attribute similarity matrix.Secondly,in order to capture the highly nonlinear relationship between network structure and attributes,a structure embedding module based on depth attribute attention and deep structure attention is designed.The two modules can show learning preferences for structure and attribute features respectively during training.Finally,the hidden layer output of the two modules is used as the final node representation result.(3)As it is difficult to guarantee the reliability of real network data,a self-supervised heterogeneous information network representation learning method based on structure and semantic perception is proposed.First,for the original heterogeneous information network that may contain noise and completeness,a structure enhancement method based on node properties is applied.Then,on the one hand,the network pattern structure is explored through node-level attention and type-level attention to realize the preservation of the node local neighborhood structure.On the other hand,we explore the exchange matrix that preserves the higher-order semantics of the network with a deep neural network.Finally,a specific self-supervised learning loss is designed to train the model from the way that the data itself is mined to obtain the final node representation.Finally,the above proposed network representation learning methods on multiple different datasets are experimentally verified and compared with existing network representation learning methods.The experimental results demonstrate the effectiveness of the proposed methods on multiple network analysis tasks such as link prediction and node classification. |