Multi-dimensional analysis of graph data

Posted on:2010-10-28

Degree:Ph.D

Type:Dissertation

University:University of Illinois at Urbana-Champaign

Candidate:Chen, Chen

Full Text:PDF

GTID:1448390002488844

Subject:Computer Science

Abstract/Summary:

In recent years, databases and data warehouse systems have been evolving from handling normalized spreadsheets stored in relational databases, to managing and analyzing diverse application-oriented data with complex interconnecting structures. Responding to this emerging trend, large-scale graph datasets have been growing rapidly and showing their critical importance in many applications, such as the analysis of XML, social networks, Web, biological data, multimedia data and spatiotemporal data.;Can we extend useful functions of databases and data warehouse systems to handle graph structured data? In this dissertation, we argue that it is critically important to perform multi-dimensional analysis on graphs and propose a fast and user-friendly Graph OLAP framework that can interactively view and analyze graph datasets from different perspectives and with multiple granularities. Starting from basic definitions, i.e., dimensions and measures in the Graph OLAP scenario, we build a conceptual model for data cubes constructed on graphs, discuss how they can be efficiently computed, and propose a few optimizations. To make the whole framework an integral one, we spend substantial time on issues that are tightly connected with the success of the overall knowledge discovery system, in particular, efficient multi-granularity mining and effective pattern presentation. First, as we perform OLAP operations like roll-ups and drill-downs, the structure of the graph dataset is correspondingly shrunk or expanded, which might change the underlying patterns. Taking frequent subgraphs for example, we analyze the relationship between patterns embedded on different abstraction levels; and interestingly, we find out that high-level patterns ("high" denotes generalized/summarized graphs) can almost surely reflect those low-level ones, if the summarization scheme is carefully designed. This property motivates us to develop a multi-granularity mining method, as the computation on shrunk versions of graphs will always be more efficient. Second, when users conduct multi-dimensional analysis and the Graph OLAP system displays results returned for different cube cells, there exists a usability bottleneck: Due to the structural complexity of graphs, what the state-of-art mining algorithms give is a lengthy list of patterns, which might be very similar to each other except small variations. This problem is addressed by postprocessing the output patterns and generating a small set of structural representatives to delegate the rest.;In conclusion, the multi-dimensional analysis framework could lead to intuitive and insightful knowledge discovery on graphs, especially when the data is large and complex. Given the emerging trend of huge information networks as listed above, it is an important research topic to devote more efforts to. We point out a few possible future works, especially discovery-driven Graph OLAP. We believe that this is an interesting direction to go, and give our initial thoughts on it.

Keywords/Search Tags:

Graph, Data, Multi-dimensional analysis

Related items

1	The Research And Implementation Of Large Scale Multi-dimensional Network Analysis Model
2	Multi-dimensional analysis of graph data
3	Design And Implementation Of Query Analysis Client Of OLAP
4	MGTag:A Multi-dimensional Graph Labeling Scheme For Fast Reachability Queries
5	Study And Development Of Multi-Dimensional Traffic Accident Analysis System Based On GIS
6	Research And Implementation Of Multi-dimensional Data Analysis System For Education Big Data
7	Non-local Variational Model Of Data Multi-classification Problem On Graph And Its Fast Algorithm Research
8	Design And Realization Of Multi-Dimensional Online Data Analysis System-Simplified Analysis System
9	The Research Of A Web-Based Multi-Dimensional Data Analytical Tool
10	The Research And Application Of Multi-dimensional Data Visualization Analysis Method