Font Size: a A A

Research On The Key Technology Of Network Representation Learning For Social Media

Posted on:2020-07-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:X T ChengFull Text:PDF
GTID:1368330620953196Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of social media networks,such as Facebook,Twitter,Weixin and Weibo,a large amount of network data generated.Expressing these data reasonably is the basis of data mining on large-scale network.Network Representation Learning(NRL),also known as Network Embedding(NE),is the most promising solution to solve this problem.In the study of existing NRL methods,the low-dimensional vector representations of nodes were obtained by using matrix decomposition or neural network with structure information and other heterogeneous information.However,for the massive,dynamic and heterogeneous social media data,the existing NRL methods still have the following problems.1)For the complexity of users and overlapping user relationships in social media,the node representations generated from existing static NRL methods are difficult to distinguish in the task of node classification.2)The interactions between nodes are freguent and dynamic,but the existing dynamic NRL methods are inadequate to model the node evolution process in a certain time window.3)For the complex semantic relationships among different objects of social media,there are some deficiencies in integrating the semantic relationship information of network links into node representation.4)The user data in social media are multi-source and contain a lot of noise.The existing NRL methods that integrating heterogeneous information are not robust in high noise scenarios.In order to solve the above problems,the dissertation relies on the pre-research project and the National Natural Science Foundation of China project "Link Prediction Principles and Methods for Directed Networks".With the help of abundant user data in social media networks,this paper studies the key technologies of network representation learning for social media.The main research results are as follows:1.Aiming at the problem that the node representations generated from the existing static NRL methods are difficult to distinguish in the task of nodes classification,we proposed a new network representation learning method,DML-NRL,which integrates the label information of nodes.The model makes full use of label information and introduces the deep metric learning technology for the first time.It integrates the distance information between different kinds of nodes into the training process of generating node representation.This effectively improves the accuracy of the NRL algorithm by making the existing NRL model contain the measurement of global information.The simulation comparison with the existing methods on the real data sets show that the accuracy of this method increased by about 10% on the node multi-label classification task,and the classification is clearer in the visualization task.2.Aiming at the problem that the existing dynamic NRL method is insufficient to model the time-varying information of network structure,we proposed a new network representation learning method,RWR-STNE,which integrates temporal and spatial change information.Firstly,we use the network structure of current and past time in the dynamic network to construct the spatial-temporal trajectory graph of user nodes in a certain time window.So that the information of user's spatio-temporal change is embedded in the static spatio-temporal trajectory graph.Then,we use the restart random walk algorithm to obtain the random walk sequences of nodes in the trajectory graph.Finally,we obtain the trajectory representations of user node in a certain time window by using the classical Skip-gram model.The experimental results show that the proposed method can effectively fuse the spatio-temporal information of the nodes and improve the performance of node classification and link prediction tasks by more than 5%.3.Aiming at the problem that the existing NRL methods are insufficient in describing the rich semantic information of links in social media networks,the dissertation proposes a network representation learning method that integrates the semantic information of links.Firstly,this method makes random walk in the network based on the metapath expressing different semantic information to generate the sequence of nodes.Then according to the node sequence and the calculation method of metapath weight,the heterogeneous information network is transformed into a weighted sub-network with multi-dimensional semantic information.Finally,the node representation is obtained by using Skip-gram model in the extracted weighted sub-network.The experimental results show that this method can effectively filter the important metapath and generate the node representation integrating different metapath semantic information.It is better than the benchmark algorithm in the task of node classification.4.Aiming at the problem that the NRL methods based on heterogeneous information are not robust in dealing with high-noise data in social media,the dissertation proposes a network representation fusion method based on D-S evidence theory.With the development of network representation learning,more and more researchers consider integrating multi-dimensional attribute information to improve the performance of network representation.Due to the difference of information sources,the mutual verification of multi-source information can improve the performance of network representations.However,the conflict information can also reduce the fusion effect.To solve this problem,the dissertation proposes a multi-feature decision fusion method.Firstly,we calculate the support degree of different attribute information to the fusion result by SVM algorithm.Then,we use the evidence combination rule to deal with the conflict and evaluate the fusion of network representation.We also introduce the confusion matrix to model the local credibility of each classification.Finally,the simulation experiments on three kinds of data sets show that this method can detect conflicts in network representation fusion and improve the performance of node representations.Based on the above research on network representation learning,the dissertation also proposes a framework for user behavior analysis and coding methods for different types of data.At the same time,according to the characteristics of network data,we take micro-blog network and telecommunication network as the typical network in different scenarios.Through the feature analysis and simulation experiments,we verify the effectiveness of the proposed method.
Keywords/Search Tags:network representation learning, metric learing, dynamic network, spatial-temporal graph, metapath, evidence theory, representation fusion
PDF Full Text Request
Related items