Font Size: a A A

Research On Representation Learning And Application For Heterogeneous Networked Data

Posted on:2022-07-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:W Z SongFull Text:PDF
GTID:1488306728482384Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of science and technology,people increasingly rely on the convenience that various information systems bring to their daily lives.For example,online shopping services provided by websites such as Taobao and Jingdong are becoming the main way for the public to purchase goods;online social networking platforms such as Weibo and Twitter provide hundreds of millions of users with the service of communicating and sharing information every day.These platforms need to process a large amount of data containing user behavior patterns and system operation mechanisms every day.To study the complex dependencies within the data,researchers often use heterogeneous networks to represent the data,where nodes in a heterogeneous network are used to represent different kinds of samples in the dataset,and different classes of links between nodes are used to represent different dependencies between samples in the dataset.However,although data mining of heterogeneous networks has gained prominence in several fields,with the growth of data size and the development of the field,the following problems remain to be solved in this area.(1)As the scale of network data increases,the scale of inter-node dependencies also grows exponentially,and thus the algorithms for mining network data need more efficient data representation methods and processing mechanisms.(2)Real-world network data are usually highly sparse,i.e.,the known information in the dataset is much less than the unknown information that needs to be predicted,which requires algorithms with strong induction and generalization capabilities.(3)The algorithm needs to consider the specific semantics of the data in realistic scenarios.For example,in signed social networks,the relationship between users includes not only the friend or trust relationship between users,but also the hostile or distrust relationship between users,and the relationships between friends and enemies,trust and distrust have certain opposite relationships in semantics.To address the above problems,this paper proposes two representation learning algorithms for signed social networks.The proposed algorithms can automatically learn feature vectors for large-scale networks,and experiments show that they can provide efficient data representations for data mining algorithms.To address the sparsity problem of network data,this paper first extracts the implied higher-order dependencies and network hierarchies in signed social networks,and then uses these implied dependencies to enhance the network representation vectors learned by the algorithms.Finally,by representing shopping history data as a heterogeneous network,this paper successfully applies the heterogeneous network data mining algorithm to practical scenarios such as online shopping.The algorithm successfully alleviates the problem of missing information of short sessions in online shopping scenarios.Specifically,the main work of this paper is as follows.(1)In this paper,a signed network representation learning algorithm capable of learning linked representation vectors is proposed.The goal of existing network representation learning algorithms is to learn the representation vectors of nodes in the network and then use feature engineering to obtain the representation vectors of links.This approach not only leads to loss of structural information and inconsistency problems in the network,but also requires a lot of analysis and design work by domain experts when facing link-based network data mining tasks.To address this problem,this paper proposes a signed network representation learning algorithm that learns node and link representation vectors simultaneously.The algorithm proposes higher-order similarity relations of non-adjacent nodes to alleviate the data sparsity problem,and proposes the assumption of multiple node and link interrelationships to reduce the computational complexity and spatial complexity of the algorithm;finally,the algorithm adopts deep neural network and matrix decomposition techniques to implement the proposed framework,respectively,and uses a large number of experiments to verify the effectiveness of the proposed algorithm.(2)In this paper,a signed network representation learning algorithm based on nonEuclidean space is proposed.Unlike existing algorithms,this algorithm can effectively extract and represent the implied hierarchical relationships in signed networks.By analyzing several social network datasets from the real world,this paper points out that the implied hierarchy is widely present in real signed social networks.However,existing signed network representation learning algorithms do not consider such common structural features.Also,existing node representation methods based on Euclidean space cannot represent the hierarchical relationships among nodes.To address the above problems,the proposed algorithm uses hyperbolic space to represent signed social networks and uses the relative positions of node representation vectors in hyperbolic space to represent the hierarchical structure of the network;this paper proposes an efficient optimization algorithm for this algorithm based on structural balance theory and Riemannian gradient descent algorithm.Finally,this paper verifies the effectiveness of the proposed algorithm using multiple tasks on real-world datasets.The experimental results show that the proposed method is able to extract and represent the implied hierarchical structure for signed networks.(3)This paper proposes a recommendation algorithm for the short session recommendation problem.In this work,we show that the shopping data in the recommendation scenario can be seen as a heterogeneous network and use network data mining method to extract and exploit the similarity relationships between users and dependencies between sessions in the data to alleviate the problem of missing information in short sessions.A session is a collection of items that users interact with in a short period of time,and sessions containing a small number of items(called short sessions)are widely available in realistic datasets.Due to the limited interaction information contained in short sessions,it is difficult for recommendation algorithms to accurately estimate users’ current needs and product preferences based on this information.To improve the stability and performance of algorithms,existing algorithms usually remove short sessions from the dataset during data pre-processing.However,although this data preprocessing reduces the difficulty of the algorithm,it ignores the widespread short-session recommendation problem in real recommendation scenarios,thus reducing the performance of existing algorithms in the face of short sessions.To address this problem,this paper models session-based shopping data as a heterogeneous network,extracting and exploiting the relationships among users in the dataset,as well as the implicit similar dependencies among sessions to complement the limited information in short sessions.Finally,by comparing with the existing state-ofthe-art recommendation algorithms,this paper verifies that the proposed algorithm can effectively improve the recommendation accuracy in this scenario.
Keywords/Search Tags:Heterogeneous data, social network data, network representation learning, session-based recommendation
PDF Full Text Request
Related items