Font Size: a A A

Improved data partitioning for distributed graph-based data mining

Posted on:2004-06-12Degree:M.SType:Thesis
University:The University of Texas at ArlingtonCandidate:Pant, AmitabhFull Text:PDF
GTID:2468390011473211Subject:Computer Science
Abstract/Summary:
With the explosion in data collection and storage in different fields, various tools that analyze such data have also been created. These datasets can be represented as a graph with the entities or objects in the dataset being represented as vertices or connected subgraphs and their relationships or the attributes of theses objects being represented as edges.; However such data representation leads to large graphs, which in turn means graph based data analysis tools have to be scalable. To make such tools scalable, distributed and parallel processing approaches are used. These approaches require the data to be partitioned and distributed to the processors.; This research outlines some refinements to the data partitioning process for graph based data mining tools and applies the suggested approach to the SUBDUE graph-based data mining system. The main approach suggested by this study is biasing the edge weights of closely connected and frequent subgraphs in the given input graph. This leads to less edge loss within these subgraphs and thus connected components of the input are preserved in the same partition during the partitioning phase. This preservation of frequent subgraphs can contribute more towards the analysis and discovery of interesting patterns in the given data.
Keywords/Search Tags:Graph-based data mining, Data partitioning, Such data, Frequent subgraphs, Distributed, Graph based data
Related items