Font Size: a A A

Research On Degree-biased Sampling Algorithm For Large-scale Network Representation Learning

Posted on:2020-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ZhangFull Text:PDF
GTID:2370330590458335Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Network representation learning aims to represent nodes in a network as low-dimensional,dense real-valued vectors,thus serving as features for classical network analysis tasks such as classification,prediction and visualization.Traditional network representation learning methods use matrix decomposition for dimension reduction to obtain node representations.Due to lack of scalability and universality,they have been gradually replaced by a novel kind of methods based on deep learning.Methods based on deep learning usually adopt random walk to sample node sequences,and neural network are used for training node vectors.However,they all ignore the scale-free characteristic of real networks and adopt a ‘one size fits all' sampling strategy for each node in networks,which could bring plenty of redundant information in the generated node sequences and make it unable to well preserve the structures of original networks,then greatly limiting the effectiveness and efficiency of network representation learning.Therefore,a degree-biased variable-length random walk with backtracking,DiaRW is proposed.A degree-biased backtracking mechanism is introduced to uniform random walk,by letting walks from high-degree nodes backtrack in a probabilistic way,where the topological structures could be extracted more fully with the central role of high-degree nodes.Meanwhile,a centrality based variable-length strategy is designed in replace of the fixed-length ones,aiming to reduce the redundant information collected by low-degree nodes.DiaRW focuses on the scale-free characteristic of real-network,making the extraction for structure information more efficient and accurate,improving the effectiveness and efficiency of network representation learning.The experimental results show that DiaRW can greatly improve the efficiency of network representation learning while ensuring the quality of node vectors.For a network with millions of nodes(YouTube),it only takes 58 minutes to finish representation learning for all the nodes,which is tenfold faster than Node2 Vec.Moreover,the learned vectors can obtain 8.1% and 9.6% improvements on Micro-F1 and Macro-F1 for node multi-label classification task.
Keywords/Search Tags:Network Representation Learning, Scale-free Network, Random Walk
PDF Full Text Request
Related items