Improved data partitioning for distributed graph-based data mining

Posted on:2004-06-12

Degree:M.S

Type:Thesis

University:The University of Texas at Arlington

Candidate:Pant, Amitabh

Full Text:PDF

GTID:2468390011473211

Subject:Computer Science

Abstract/Summary:

With the explosion in data collection and storage in different fields, various tools that analyze such data have also been created. These datasets can be represented as a graph with the entities or objects in the dataset being represented as vertices or connected subgraphs and their relationships or the attributes of theses objects being represented as edges.; However such data representation leads to large graphs, which in turn means graph based data analysis tools have to be scalable. To make such tools scalable, distributed and parallel processing approaches are used. These approaches require the data to be partitioned and distributed to the processors.; This research outlines some refinements to the data partitioning process for graph based data mining tools and applies the suggested approach to the SUBDUE graph-based data mining system. The main approach suggested by this study is biasing the edge weights of closely connected and frequent subgraphs in the given input graph. This leads to less edge loss within these subgraphs and thus connected components of the input are preserved in the same partition during the partitioning phase. This preservation of frequent subgraphs can contribute more towards the analysis and discovery of interesting patterns in the given data.

Keywords/Search Tags:

Graph-based data mining, Data partitioning, Such data, Frequent subgraphs, Distributed, Graph based data

Related items

1	Efficient Algorithm For Mining Dense Subgraphs In Uncertain Graph
2	Research Of Neglected Conditions Defects Discovery Method Based On Graph Data Mining
3	Research On The Algorithm For Mining Structured Data
4	Research On Uncertain Frequent Graph Data Mining
5	Algorithms For Mining Frequent Subgraphs And Its Applications In Biological Network
6	Research On Partitioning Approach For Large-scale Linked Data Based Property Graphs
7	Research On Mining Maximal Frequent Subgraphs Approach
8	Research On The Frequent Graph Mining Algorithm For Graph Classification
9	Mining Frequent Subgraph Based On Pre-clipping In Uncertain Graph Databases
10	Research On RDF Stream Data Partitioning Algorithm Based On Graph Model