Clustering algorithms for data and knowledge exploration

Posted on:2004-02-07

Degree:Ph.D

Type:Dissertation

University:The University of Iowa

Candidate:Gan, Yuan

Full Text:PDF

GTID:1468390011468552

Subject:Engineering

Abstract/Summary:

Data mining—also referred to as knowledge discovery in databases (KDD)—has received a considerable attention in industrial engineering due to its applicability in industry. Among many data mining methods, clustering is attractive because it is simple to understand and easy to implement. However, the existing clustering algorithms restricted to solving limited data mining problems. Four limitations come from the feature type, feature set number, instance number, and cluster shape. In this research, a new clustering approach based on similarity measure is developed.; In this research, the clustering process is partitioned into three steps: similarity definition, feature preparation and clustering. In the similarity definition step, different types of similarity measures are considered, e.g., point-point similarity measure, point set, set-set, categorical features, summarized features, etc. The defined similarity measures are used in the final two steps. The major purpose of the feature preparation step is transforming (integrating, discretizing, etc.) feature sets and removing irrelevant and redundant features. A clustering algorithm determines clusters using the defined similarity measures. A new feature selection method based on similarity measure is proposed. Unlike traditional feature selection methods, the proposed algorithm is based on discrimination and similarity measure. The selected feature subsets have the same discrimination power as the original feature set and the minimum value of the corresponding similarity measure. The concept of mutual bonds and the triangularization algorithm are key to the new clustering algorithm. The new algorithm has low time computational complexity and low intermediate storage requirement. Finally, the proposed clustering method explores clustering problems of irregular shape. Many existing clustering algorithms cannot handle clusters of complex shapes. A new algorithm for efficiently finding the minimum spanning tree is developed. The edge lengths of the tree are defined by the similarity measure. Clusters are formed by separating the minimum spanning tree.; The main contribution of this research is the development of a formal approach for clustering. Similarity measures are critical to this approach. Computational experience on various data sets (benchmark data sets and industrial data sets) with the proposed approach has proven the efficiency, validity, and reliability of this approach.

Keywords/Search Tags:

Data, Clustering, Similarity measure, Approach, Feature, Proposed

Related items

1	Implementation And Application Of Global-Relationship Similarity Measure In Clustering
2	The Approach To Mining Time-lagged Coregulated Gene And Research On Fuzzy Clustering Algorithm
3	The Research And Application Of Clustering Feature Selection Methods
4	Research On Feature Representation And Similarity Measure Methods In Time Series Data Mining
5	Evidence Reasoning Method Research Based On The ISODATA Clustering And Improved Similarity Measure
6	Research On Spectral Clustering With Improved Similarity Measure
7	Research On Intuitionistic Fuzzy Clustering Method : Based On Knowledge Measure Theory
8	Research And Application Of Spectral Clustering Based On Density Adaptive Neighborhood
9	Research For Feature Selection Algorithm Based On Text Clustering
10	Research On Clustering Of Uncertain Data