Font Size: a A A

Multi-dimensional analysis of graph data

Posted on:2010-10-28Degree:Ph.DType:Dissertation
University:University of Illinois at Urbana-ChampaignCandidate:Chen, ChenFull Text:PDF
GTID:1448390002488844Subject:Computer Science
Abstract/Summary:
In recent years, databases and data warehouse systems have been evolving from handling normalized spreadsheets stored in relational databases, to managing and analyzing diverse application-oriented data with complex interconnecting structures. Responding to this emerging trend, large-scale graph datasets have been growing rapidly and showing their critical importance in many applications, such as the analysis of XML, social networks, Web, biological data, multimedia data and spatiotemporal data.;Can we extend useful functions of databases and data warehouse systems to handle graph structured data? In this dissertation, we argue that it is critically important to perform multi-dimensional analysis on graphs and propose a fast and user-friendly Graph OLAP framework that can interactively view and analyze graph datasets from different perspectives and with multiple granularities. Starting from basic definitions, i.e., dimensions and measures in the Graph OLAP scenario, we build a conceptual model for data cubes constructed on graphs, discuss how they can be efficiently computed, and propose a few optimizations. To make the whole framework an integral one, we spend substantial time on issues that are tightly connected with the success of the overall knowledge discovery system, in particular, efficient multi-granularity mining and effective pattern presentation. First, as we perform OLAP operations like roll-ups and drill-downs, the structure of the graph dataset is correspondingly shrunk or expanded, which might change the underlying patterns. Taking frequent subgraphs for example, we analyze the relationship between patterns embedded on different abstraction levels; and interestingly, we find out that high-level patterns ("high" denotes generalized/summarized graphs) can almost surely reflect those low-level ones, if the summarization scheme is carefully designed. This property motivates us to develop a multi-granularity mining method, as the computation on shrunk versions of graphs will always be more efficient. Second, when users conduct multi-dimensional analysis and the Graph OLAP system displays results returned for different cube cells, there exists a usability bottleneck: Due to the structural complexity of graphs, what the state-of-art mining algorithms give is a lengthy list of patterns, which might be very similar to each other except small variations. This problem is addressed by postprocessing the output patterns and generating a small set of structural representatives to delegate the rest.;In conclusion, the multi-dimensional analysis framework could lead to intuitive and insightful knowledge discovery on graphs, especially when the data is large and complex. Given the emerging trend of huge information networks as listed above, it is an important research topic to devote more efforts to. We point out a few possible future works, especially discovery-driven Graph OLAP. We believe that this is an interesting direction to go, and give our initial thoughts on it.
Keywords/Search Tags:Graph, Data, Multi-dimensional analysis
Related items