Font Size: a A A

The Modeling Of Spatio-temporal Features By Using Of Graph Representation Learning And Its Applications In Network Anomaly Detection For Information Network

Posted on:2024-09-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:C M YangFull Text:PDF
GTID:1520307079950869Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Information networks are the key infrastructure of modern society.Currently,it is one of the most important research and operation-management topics how to obtain accurate and complete status and observed data of an information network,and how to determine and to predict whether the network is reliable and trustworthy without errors based on the acquired data,for people in the fields of academia,industry,application,and management.This dissertation studies the spatio-temporal correlations among the nodes and edges in the case of complex,dynamic,and time-varying features of the data on nodes and edges(i.e.,network observation data)in a network.In the dissertation,the spatio-temporal correlations are simplified and analyzed using the mechanism dominated by hidden variables to obtain a spatio-temporal correlation model of network observation data.Based on the spatio-temporal correlation model,the dissertation proposes novel methods for detecting function-wise and performance-wise anomalies on nodes and edges through the features learned by the graph representation learning methods.The main research contents and contributions of the dissertation are summarized as follows.1.Network regression by graph representation based on hidden variable modeling and anomaly detectionThe dissertation applies the principle of the hidden variable modeling to describe the spatio-temporal correlations between nodes with the deterministic spatio-temporal correlation equations to solve the problem that the spatio-temporal correlations of complex network data are difficult to be formalized explicitly.In such a way,the deterministic spatio-temporal correlations of the complex network data are established.Based on the hidden variable model,the dissertation applies the principle of the graph representation learning to convert the problem of approximating hidden variables into the representation of spatio-temporal features,and proposes a method to represent the spatio-temporal features in graph data regression.The dissertation designs a new method to detect anomalous nodes based on the node data regression errors.In order to maintain the spatio-temporal correlations between the hidden variables in data regression,the dissertation designs a graph representation model based on the graph neural network and the recurrent neural network,named Multi-scale Spatio-Temporal Neural Network(MSTNN).This new model can extract the multi-scale spatial features from multi-range neighborhoods and the multi-scale temporal features from multi-range time intervals.Moreover,the MSTNN can fuse the multi-scale spatio-temporal features through the branch structure.According to the experiments conducted on the standard IP backbone network datasets[1,2],the MSTNN-based anomalous node detection algorithm can significantly improve the detection accuracy by more than 14%in terms of F1 score.2.The reconstruction of stochastic network data and network anomaly detectionDue to the randomness of node data and topology in dynamic networks that cannot be formalized by the deterministic equations,the dissertation establishes the conditional distribution model with hidden variables.To solve the problem of inferring the distributions of the hidden variables,the dissertation proposes a method to achieve distribution inference in reconstruction and an implementation scheme to detect anomalous nodes and anomalous edges.The anomalies are determined by whether the conditional probability of the current data conditioned on the node hidden variable is less than the threshold derived from the normal data.The dissertation proposes a method for approximating node hidden variables using random data reconstruction.This method converts the problem of inferring the joint distribution of node data and edge data into the problem of estimating the parameters of the posterior and the prior of the approximated node hidden variables,and the reconstruction distributions of nodes and edges under the framework of variational auto-coding.To estimate the parameters,the dissertation designs the Multi-scale Variational Graph Recurrent Auto-Encoder(M-VGRAE)model.According to the experiments conducted on the standard dynamic network datasets[3,4],the M-VGRAE-based anomalous node and anomalous edge detection algorithm can improve the detection performance in terms of the Area under the Curve of Receiver Operating Characteristic(AUC)by more than 15%.3.Network anomaly detection using the directed-information-based feature representation learningIn order to reduce the performance deterioration of the graph representation learning in the presence of interference samples,the dissertation applies the principle of network information flows to establish the spatio-temporal correlations of network data by making an analogy between the interaction process of network parameters and a class of communication processes.The dissertation also derives a representation learning objective function that maximizes the mutual information between node hidden variables and node representation variables.In the view of communication,the representation model is equivalent to a special communication process,and then a representation learning loss function based on the capacity maximization(Cap Max)is designed.The Cap Max loss function is used as an asymptotic expression of the representation learning objective function through the analysis of the communication capacity.Due to the spatio-temporal correlations of network data,the sub-representation model at each node is equivalent to a multi-access channel with memory and feedback,and the capacity needs to be measured by the directed information rather than the mutual information.Hence,the dissertation designs an algorithm to detect anomalous nodes and anomalous edges by estimating the amount of directed information.According to the experimental results conducted on the standard dynamic network datasets[5,6],the anomaly detection using the Cap Max loss function can effectively improve the accuracy of detecting anomalous nodes and anomalous edges for multiple spatio-temporal feature representation models when there are interference samples.4.Robustness evaluation of network anomaly detection methods based on the graph representation learningThe dissertation has proposed three network anomaly detection methods based on the graph representation learning,and has analyzed and verified the effectiveness and feasibility of the three methods.However,in actual network scenarios,network data often face data perturbations caused by equipment failures and service overloads,resulting in performance deterioration in network anomaly detection.In order to verify and evaluate the robustness of the three network anomaly detection methods designed in the dissertation under data perturbations,the dissertation analyzes the perturbations that would lead to incorrect determination of network data anomalies and generate data perturbations by solving the box-constrained optimization problem.Considering the variety of data perturbation scenarios,the dissertation designs the U-ASG method to implement multiple generation schemes,such as the single-node perturbation,the multi-node perturbation,the continuous perturbation,and the random sampling perturbation.The experiments conducted on the standard datasets of IP backbone networks and dynamic networks[1,3]show that the graph representation learning methods are able to reduce detection errors of perturbed data by more than 35%.In summary,the dissertation represented the complex and inanalytical spatio-temporal correlations of network data into a form that can be analyzed analytically through the mechanism dominated by node hidden variables,and formalized the spatio-temporal relationships of the node features by using the graph representation learning principle in AI technology.Therefore,based on the new directions and methods of academic research and the validation by theoretical analysis and experiments on the standard datasets,this dissertation derived innovative results on the feature representation methods,the anomaly detection algorithms,and the robustness evaluation against the data perturbations.
Keywords/Search Tags:Network Anomaly Detection, Spatio-Temporal Feature, Neural Network, Representation Learning, Direct Information
PDF Full Text Request
Related items