Study On Clustering For Incomplete Data Based On Sample Weighting And Cluster Dispersion

Posted on:2017-03-02

Degree:Master

Type:Thesis

Country:China

Candidate:H W Liu

Full Text:PDF

GTID:2348330488459831

Subject:Control engineering

Abstract/Summary:

PDF Full Text Request

Until now, fuzzy c-means clustering algorithm has been widely used in the field of data mining and pattern classification. However, for incomplete data sets, the traditional clustering algorithm can't be directly applied and it do not take the impact of outliers, the cluster dispersion and the uncertainty of missing values into consideration, which may lead to misclassification.In this section, in view of the impact of outliers, we propose a sample-weighted fuzzy c-means clustering algorithm for incomplete data, in which the weight of samples is taken into consideration. Firstly, the weight of samples is presented by calculating the distance between the samples and the cluster prototype. Then, decide the influence the samples have on the clustering results. The greater the weight is the greater degree of influence the sample has, so as to weaken the impact of outliers.In view of the cluster dispersion and the uncertainty of missing values, we propose an interval-valued fuzzy c-means clustering algorithm based on cluster dispersion. Firstly, find the nearest neighbors of the incomplete samples and represent the missing values by nearest-neighbor intervals. Then, calculate the cluster dispersion of each cluster prototype according to the mean square deviation between the samples and the cluster prototype and take the cluster dispersion into account in clustering process. By using interval-valued data and involving cluster dispersion the misclassification of marginal data objects will be less and the uncertainty of the missing values will be shown.Results on some UCI data sets show that: The first approach can weaken the impact of outliers and make the cluster prototype closer to the real cluster prototype. The second approach can reduce the misclassification of the marginal data objects and make full use of the information of the data sets. Compared with other approaches our approaches are more effective on incomplete data sets.

Keywords/Search Tags:

Fuzzy Clustering, Sample Weighting, Cluster Dispersion, Interval

PDF Full Text Request

Related items

1	Research On Fuzzy Clustering Algorithm Of Sample And Feature Weighting
2	Research Of Weighted Clustering Algorithm For Incomplete Data Based On Adaptive Interval
3	Research On Fuzzy Clustering Based On Weightingwith Cluster Center Separation
4	Research On Fuzzy Clustering Mining Technology And Its Application In Automatic Raising Pigs
5	Adaptive Fuzzy Clustering Algorithm And Its Application In Intrusion Detection
6	Study Of Weighting Fuzzy Clustering Algorithm Based On Generalized Entropy
7	Research On Attribute Weighted And Incomplete Data Fuzzy Clustering Approaches
8	Research Of Fuzzy Clustering Algorithm For Incomplete Data Based On Interval Analysis
9	The Research Of Clustering Algorithm For Interval-Valued Intuitionistic Fuzzy Sets
10	The Research On Knowledge-Driven Fuzzy Clustering Algorithm