Font Size: a A A

Research Of Hybrid Clustering Algorithm For Incomplete Data Based On Interval Estimation

Posted on:2014-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z H BingFull Text:PDF
GTID:2268330401462114Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Partially missing or blurring attribute values of some records make data becomeincomplete during collecting data. If data are incomplete, we will not get the accurateresults whether cluster or query them, which affect the next work. So processingincomplete data is very important and also relatively complex.For the problem that clustering the incomplete data is difficult, this paperpresents an effective clustering method. Incomplete data clustering is divided into twoparts: the first part is processing incomplete data; this paper proposes the intervalreconstruction for the shortcomings of traditional range estimated. The estimatedinterval range of incomplete data is limited based on nearest-neighbor interval in thispaper. That is, pre-classify incomplete data before processing them, and then getpre-classification results. The nearest neighbors of incomplete data are selectedaccording to the correlation between the complete data and incomplete data. Theinterval ranges of missing attributes are determined by the corresponding attributevalues of these nearest neighbors. Then the neighbors which in the different classwith incomplete data are removed based on the pre-classification results. Theremaining neighbors are used to re-determine the intervals of missing attributes, andthus the incomplete data set is transformed into interval data set.In the second part, this paper proposes particle swarm and fuzzy c-means hybridclustering algorithm to cluster the processed incomplete data set. Particles in thehybrid clustering algorithm are coded by the cluster centers. The memberships arestill obtained by the gradient-based alternating iterative formula. The cluster centersare optimized by updating the speed and position of particles, and then update thememberships and the objective function value. The variation is put in the iterativeprocess of the hybrid algorithm to allow the particle swarm to escape from the localoptima. The optimal results can be found by using the global optimization ability ofthe particle swarm, which are the best clustering results.Finally, four data sets of the UCI database Iris, Wine, Bupa and Haberman are used in experiments. The experimental results of the methods in this paper arecompared with other five methods Whole Data Strategy (WDS), Partial DistanceStrategy (PDS), Optimal Completion Strategy (OCS), Nearest Prototype Strategy(NPS) and Nearest-neighbor Interval (NNI). Experimental results show that clusteringresults of the proposed methods are more accurate, the robustness is better.
Keywords/Search Tags:fuzzy clustering, incomplete data sets, nearest-neighbor interval, particleswarm, hybrid optimization
PDF Full Text Request
Related items