Font Size: a A A

Visual data mining: Using parallel coordinate plots with K-means clustering and color to find correlations in a multidimensional dataset

Posted on:2010-02-19Degree:M.SType:Thesis
University:Kutztown University of PennsylvaniaCandidate:Peterson, Angela RFull Text:PDF
GTID:2448390002488925Subject:Statistics
Abstract/Summary:PDF Full Text Request
This thesis examines the use of parallel coordinate (PC) plots for visual data mining. It concentrates on graphs using PC plots with multidimensional data sets. The concept of the "polyline" and parallel axis are defined. These are the basic building blocks for graphing a parallel coordinate plot. Visualization problems with parallel coordinate plots typically involve ambiguity and clutter. These two issues are addressed by using the technique of "clustering and color". The use of color in a parallel coordinate plot reduces the problem of ambiguity. Separating the data set into natural groups, or clusters, reduces clutter. A methodology is outlined that describes how to cluster and color a multidimensional data set. The K-means clustering algorithm will be introduced. Application of K-means to produce clusters of polylines in a PC plot is shown. The 'K' from K-means is defined as the number of clusters. The value for K is user defined. In the spirit of graphical visualization, to select the "best" number for K, the "distortion plot" is introduced. Once the methodology of graphing a meaningful parallel coordinate plot is outlined, it is illustrated with an analysis of a real multidimensional data set. The thesis finishes with a summary of the effectiveness and applications of visual data mining using a series of PC plots with clustering and color.
Keywords/Search Tags:Visual data mining, Parallel coordinate, Plot, Using, Clustering and color, K-means
PDF Full Text Request
Related items