Clustering Incomplete Data Using Pseudo Nearest Neighbor And Interval-valued Distance

Posted on:2017-08-05

Degree:Master

Type:Thesis

Country:China

Candidate:Z J Chen

Full Text:PDF

GTID:2348330488459744

Subject:Control theory and control engineering

Abstract/Summary:

PDF Full Text Request

Missing data handling is a challenging issue often dealt with in data analysis and pattern classification. Sometimes data sets can be incomplete as a result of random noise> human error, etc. However, traditional clustering methods are not directly applicable to such incomplete data. If not handled properly, these incomplete data may lead to large errors or biased clustering results, In this paper, we study the clustering algorithm of incomplete data using pseudo nearest neighbor and interval-valued distance. Results on several incomplete data sets demonstrate the effectiveness of the proposed algorithms. Main work includes:1. Concerning the uncertainty of missing attributes values, a fuzzy c-means clustering algorithm based on pseudo-nearest-neighbor intervals of incomplete data is given. The data are first completed using the pseudo-nearest-neighbor intervals approach, and then the data set can be clustered based on the fuzzy c-means algorithm for interval-valued data. The proposed algorithm estimates the missing attribute values without normalization, thus captures the essence of pattern similarities in the original untouched data set. Additionally, the pseudo nearest neighbor intervals representation takes account of implicit uncertainly of missing attribute values, and considers the angle between incomplete data and other data as well.2. In view of using missing attribute values to calculate distance, a fuzzy c-means clustering algorithm using triangle-inequality for incomplete data is proposed. Firstly, an interval representation of distance using triangle inequality is presented, which can be used to measure the distance between incomplete data and prototypes. The proposed interval distance makes full use of neighborhood information in incomplete datasets, and can also represent the uncertainty of missing attribute values to some degree. Also, the use of triangle-inequality helps the estimate of the range of interval to some degree. Then a clustering algorithm based on the proposed distance for incomplete data is given. The proposed algorithm clusters the incomplete data without elimination or imputation, and can thus avoid the possible error accumulation and propagation through the iterative optimization procedures.

Keywords/Search Tags:

Pseudo nearest neighbor, Fuzzy c-means, Incomplete data, Clustering

PDF Full Text Request

Related items

1	Research Of Hybrid Clustering Algorithm For Incomplete Data Based On Local Weighting
2	Research Of Hybrid Clustering Algorithm For Incomplete Data Based On Interval Estimation
3	Research On Density Peaks Clustering Algorithm Based On Nearest-Neighbor Optimization
4	Research And Implementation Of Incomplete Data Processing Based On AP Clustering
5	Study On Generalized Nearest Neighbor Pattern Classification
6	Research Of Fuzzy Clustering Algorithm For Incomplete Data Based On Improved BP Imputation
7	Research Of Fuzzy Clustering Algorithm For Optimizing Incomplete Data Based On Extreme Learning Machine
8	Research Of Fuzzy Clustering Algorithm For Incomplete Data Based On Interval Analysis
9	Research Of Fuzzy Clustering Algorithm For Incomplete Data Based On Information Feedback Rbf Network Valuation
10	Research Of Fuzzy Clustering Algorithm For Incomplete Data Based On Improved VAEGAN