Font Size: a A A

Context-free graph grammar induction using the minimum description length principle

Posted on:2004-02-25Degree:Ph.DType:Thesis
University:The University of Texas at ArlingtonCandidate:Jonyer, IstvanFull Text:PDF
GTID:2468390011976042Subject:Computer Science
Abstract/Summary:
This work is concerned with developing a machine learning algorithm for inducing context-free graph grammars, which is guided by the minimum description length principle. Recognizing the representational power of graphs and the ability of graph grammars to generalize, we develop an approach that extends the expressive power of hypotheses learned from examples over algorithms that learn static patterns only, to equal the expressive power of graph grammars. The induced graph grammar is our hypothesis for concepts. We motivate our work from different angles. First, we point to the need for representing more complex concepts, those that include recursion, variability and relationships among data points. Second, we point out the lack of existing graph-based systems to address these concerns. Third, we recognize the ability of other machine learning approaches to address some of these concerns, and that the combination of graph-based systems with these features will have practical utility. We survey related work, among which are graph-based and inductive logic programming systems. These systems are the closest to our approach, either because they represent data in graph format, or they are able to learn recursive concepts. We describe our approach in detail, which includes the ability to learn recursive concepts, concepts that include variable data points, and those that include relationships among data points. We extend our approach for both discovery and concept learning modes. We find that the class of graph grammars learned by our algorithm is equivalent to the class of regular graph grammars. The algorithm is implemented as an extension to the Subdue knowledge discovery system, and is named SubdueGL. We present a rigorous evaluation methodology used to validate the new algorithm. We find that the system tolerates noise to a certain degree. We also evaluate the system on several real-world examples, among which are an anti-terrorism domain, protein sequences of primary and secondary structure, and a breast cancer domain. We compare the system to prominent competing approaches on further examples, which are commonly used in the literature to for the purpose of evaluating machine learning systems. We find that our approach performs better than competing ILP systems on the domains tested, and does better that some statistical and connectionist approaches reported in the literature.
Keywords/Search Tags:Graph, Machine learning, Systems, Approach, Algorithm
Related items