Research Of Hybrid Clustering Algorithm For Incomplete Data Based On Interval Estimation

Posted on:2014-02-25

Degree:Master

Type:Thesis

Country:China

Candidate:Z H Bing

Full Text:PDF

GTID:2268330401462114

Subject:Computer software and theory

Abstract/Summary:

Partially missing or blurring attribute values of some records make data becomeincomplete during collecting data. If data are incomplete, we will not get the accurateresults whether cluster or query them, which affect the next work. So processingincomplete data is very important and also relatively complex.For the problem that clustering the incomplete data is difficult, this paperpresents an effective clustering method. Incomplete data clustering is divided into twoparts: the first part is processing incomplete data; this paper proposes the intervalreconstruction for the shortcomings of traditional range estimated. The estimatedinterval range of incomplete data is limited based on nearest-neighbor interval in thispaper. That is, pre-classify incomplete data before processing them, and then getpre-classification results. The nearest neighbors of incomplete data are selectedaccording to the correlation between the complete data and incomplete data. Theinterval ranges of missing attributes are determined by the corresponding attributevalues of these nearest neighbors. Then the neighbors which in the different classwith incomplete data are removed based on the pre-classification results. Theremaining neighbors are used to re-determine the intervals of missing attributes, andthus the incomplete data set is transformed into interval data set.In the second part, this paper proposes particle swarm and fuzzy c-means hybridclustering algorithm to cluster the processed incomplete data set. Particles in thehybrid clustering algorithm are coded by the cluster centers. The memberships arestill obtained by the gradient-based alternating iterative formula. The cluster centersare optimized by updating the speed and position of particles, and then update thememberships and the objective function value. The variation is put in the iterativeprocess of the hybrid algorithm to allow the particle swarm to escape from the localoptima. The optimal results can be found by using the global optimization ability ofthe particle swarm, which are the best clustering results.Finally, four data sets of the UCI database Iris, Wine, Bupa and Haberman are used in experiments. The experimental results of the methods in this paper arecompared with other five methods Whole Data Strategy (WDS), Partial DistanceStrategy (PDS), Optimal Completion Strategy (OCS), Nearest Prototype Strategy(NPS) and Nearest-neighbor Interval (NNI). Experimental results show that clusteringresults of the proposed methods are more accurate, the robustness is better.

Keywords/Search Tags:

fuzzy clustering, incomplete data sets, nearest-neighbor interval, particleswarm, hybrid optimization

Related items

1	Research Of Fuzzy Clustering Algorithm For Incomplete Data Based On The Improved ACO With Interval Supervision
2	Clustering Incomplete Data Using Pseudo Nearest Neighbor And Interval-valued Distance
3	Research Of Fuzzy Clustering Algorithm For Incomplete Data Based On Improved BP Imputation
4	Research Of Fuzzy Clustering Algorithm For Optimizing Incomplete Data Based On Extreme Learning Machine
5	Research Of Fuzzy Clustering Algorithm For Incomplete Data Based On Information Feedback Rbf Network Valuation
6	Research Of Hybrid Clustering Algorithm For Incomplete Data Based On Local Weighting
7	Research Of Fuzzy Clustering Algorithm For Incomplete Data Based On Interval Analysis
8	Research Of Weighted Clustering Algorithm For Incomplete Data Based On Adaptive Interval
9	Research And Implementation Of Incomplete Data Processing Based On AP Clustering
10	Prediction Of Moving Objects' K-Nearest Neighbor Based On Fuzzy-Rough Sets