Visual data mining: Using parallel coordinate plots with K-means clustering and color to find correlations in a multidimensional dataset

Posted on:2010-02-19

Degree:M.S

Type:Thesis

University:Kutztown University of Pennsylvania

Candidate:Peterson, Angela R

Full Text:PDF

GTID:2448390002488925

Subject:Statistics

Abstract/Summary:

This thesis examines the use of parallel coordinate (PC) plots for visual data mining. It concentrates on graphs using PC plots with multidimensional data sets. The concept of the "polyline" and parallel axis are defined. These are the basic building blocks for graphing a parallel coordinate plot. Visualization problems with parallel coordinate plots typically involve ambiguity and clutter. These two issues are addressed by using the technique of "clustering and color". The use of color in a parallel coordinate plot reduces the problem of ambiguity. Separating the data set into natural groups, or clusters, reduces clutter. A methodology is outlined that describes how to cluster and color a multidimensional data set. The K-means clustering algorithm will be introduced. Application of K-means to produce clusters of polylines in a PC plot is shown. The 'K' from K-means is defined as the number of clusters. The value for K is user defined. In the spirit of graphical visualization, to select the "best" number for K, the "distortion plot" is introduced. Once the methodology of graphing a meaningful parallel coordinate plot is outlined, it is illustrated with an analysis of a real multidimensional data set. The thesis finishes with a summary of the effectiveness and applications of visual data mining using a series of PC plots with clustering and color.

Keywords/Search Tags:

Visual data mining, Parallel coordinate, Plot, Using, Clustering and color, K-means

Related items

1	The Research Of K-means Clustering Algorithm In Data Mining
2	Research On Visual Analysis Of High Dimensional Data
3	The Research On Volume Rendering Visual Analytics For Multi Seismic Attributes Data
4	Research On Parallel Optimization Of Clustering Algorithms In Data Mining
5	The Research On Visual Data Mining Technology Based On Parallel Coordinates
6	A Fast And Efficient Parallel Bisecting K-Means Algorithm
7	Research On K-MEANS Algorithm Based On GPU Parallel And Its Application In Text Clustering
8	Research And Implementation Of Visualization For Cluster Process Based On Parallel Coordinates
9	Scmi-superviscd K-means Clustering Algorithm In Data Mining
10	Clustering Algorithm And Analysis Of Customer Loyalty