Font Size: a A A

Stochastic graph grammars: Parameter learning and applications

Posted on:2011-05-10Degree:Ph.DType:Thesis
University:University of Maryland, Baltimore CountyCandidate:Mukherjee, SouravFull Text:PDF
GTID:2448390002968950Subject:Computer Science
Abstract/Summary:
Although traditional data mining algorithms often assume the data to have a feature-vector representation, emerging domains for data mining such as bioinformatics and social networks require the underlying data to be represented as graphs. In this thesis, we present stochastic graph grammars as probability models suitable for mining graph databases.;We first consider the problem of parameter learning of stochastic graph grammars from training data, and observe that the only known solution to this problem, PEGG (Parameter Estimation for Graph Grammars), relies on the bottom up parsing of the training instances. Parsing can be prohibitively expensive for many real grammars. To overcome this difficulty, we present the QuickPEGG algorithm for learning the grammar parameters more efficiently. QuickPEGG utilizes the fact that for most grammars, the probability distributions of the number of nodes and the number of edges of the underlying graph distribution contain sufficient information to infer the grammar parameters, thus rendering parsing unnecessary for parameter learning.;We then consider applications of the learned grammar to knowledge discovery in the underlying domain. In the existing literature, most applications of graph grammars are dominated by sampling and parsing. However, we show that once the stochastic graph grammar has been learned, interesting statistics such as the degree distribution may be inferred from it. Because of this property, stochastic graph grammars may be used as generative models for useful families of networks, such as scale-free networks where the degree of a node follows a power law distribution.;In summary, our work establishes stochastic graph grammars as a framework suitable for mining graph databases; the algorithms presented in this thesis improve our ability to efficiently learn grammar parameters from training data. The applications shown in this thesis enable us to obtain a deeper understanding of the underlying graph distribution, once the grammar has been learned.
Keywords/Search Tags:Graph, Parameter learning, Data, Applications, Distribution, Underlying, Mining
Related items