Font Size: a A A

Information Cascade Diffusion Study Based On Graph Neural Networks

Posted on:2022-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:C LiFull Text:PDF
GTID:2518306764977459Subject:Journalism and Media
Abstract/Summary:PDF Full Text Request
The advances of the Internet,mobile devices,and social networks have enabled an explosive growth of(web)content such as news,tweets,posts,advertisements,etc.,which are rapidly disseminated in the network.Information cascade is a necessary dynamical process of information diffusion learning.Under the right conditions,an information cascade can occupy a big part of the network,starting with diffusion from a node or a group of nodes.In the last decade,extensive data mining research has focused on information cascade diffusion,e.g.,feature-based works,probabilistic statistical approaches,using deep learning techniques to model the information dispersal.However,despite achieving promising improvements in cascade prediction,current state-of-the-art methods confront the following key challenges:(1)lack effective dynamical heterogeneous graph modeling and thus fail to capture the complex dependencies and dynamic relations between different types of entities;(2)lack fast and efficient node sampling strategy based on the large-scale dataset;(3)rely on extensive hand-crafted feature engineering that cannot be generalized from one domain to another and are not easy to implement;(4)heavily rely on(semi)supervised training where large labeled data is necessary but expensive or hard to obtain;(5)existing data augmentations based on semantic relatedness ignore the structural feature of information propagation.Aiming at the challenges(1-3),we introduce SI-HDGNN,an end-to-end prediction model that learns the heterogeneous representation of information cascade via a Heterogeneous Dynamical Graph Neural Network.It studies the dynamic evolving process of information diffusion while capturing the rich structures and semantics embedded in largescale heterogeneous graphs.SI-HDGNN bridges the gap between dynamical GNNs and heterogeneous information network(HIN)embedding,which has largely been studied independently in prior works.SI-HDGNN learns node representations with a newly designed heterogeneous GNN that aggregates the neighboring features of nodes with a fastweighted contextualized node sampling strategy.In addition,SI-HDGNN is a temporalattentive representation network,preserving the unevenly distributed information impact of nodes.It also captures the dynamic evolution of nodes and the temporal dependencies among heterogeneous entities by encoding temporal cascading information into node representations.Aiming at the challenges(4-5),we propose a self-supervised learning framework RDEA,in which three novel event augmentation strategies are designed(i.e.,node masking,subgraph,and edge dropping).First,we permute both content features and propagation structures to generate positive samples for cascade events.Thus the intrinsic data correlation is utilized to derive self-supervision signals and enhance the information cascade representation by contrastive pre-training with the augmented data.Then we fine-tune the model with the label to make the final prediction.In order to further prove the validity and reliability of proposed models,this thesis takes sufficient experiments,e.g.,experimental evaluations,ablation analysis,case study,and performance for extended scenarios,on various large-scaled public-available datasets.
Keywords/Search Tags:Information Cascade, Information Diffusion, Graph Neural Networks, Heterogeneous Graph, Self-Supervised Learning
PDF Full Text Request
Related items