Font Size: a A A

Supervised learning in dynamic streaming graphs

Posted on:2017-09-08Degree:Ph.DType:Thesis
University:Washington State UniversityCandidate:Yao, YiboFull Text:PDF
GTID:2468390014973027Subject:Computer Science
Abstract/Summary:
With the emergence of nEtworked data, graph classification has received considerable interest during the past years. Most approaches to graph classification focus on designing effective kernels to compute similarities for static graphs. However, they become computationally intractable in terms of time and space when a graph is presented in an incremental fashion with continuous updates, i.e., insertions of nodes and edges. In this thesis, we examine the problem of classification in large-scale and incrementally changing graphs. We focus on two classification scenarios: the graph transaction scenario, and the one large graph scenario. We propose a unified framework to study this problem. The framework consists of three components: a subgraph extraction process, an online version of an existing fast graph kernel, and two kernel-based classifiers (SVM and Perceptron). To classify nodes in a single large graph, we design an entropy-based subgraph extraction strategy to select informative neighbor nodes and discard those with less discriminative power, to facilitate an effective classification process. By retaining the support vectors from each learning step, the classification models built using the kernel-based classifiers can be incrementally updated whenever new changes are made to the subject graph. We demonstrate the advantages of our learning techniques by conducting an empirical evaluation on several large-scale real-world graph datasets.;On the other hand, detecting concept drift in data streams has been widely studied in the data mining community. Conventional drift detection methods use classifiers' outputs (e.g., classification accuracy, error rate) as indicators to signal concept changes. As a result, their performance greatly depends on the chosen classifiers. However, there is little work on addressing concept drift in graph-structured data. We present an entropy-based method to effectively detect concept drift in graph streams. Contrary to many related works, we investigate the intrinsic properties of data (i.e., subgraph distribution w.r.t. category labels), instead of monitoring classification outputs. Our approach is combined with the proposed incremental graph learning techniques and tested on synthetic and real-world graph data streams. The experimental results demonstrate the advantage of our method in detecting concept drift as well as improving classification performance.
Keywords/Search Tags:Graph, Classification, Concept drift, Data
Related items