Font Size: a A A

Research Of Weighted Clustering Algorithm For Incomplete Data Based On Adaptive Interval

Posted on:2020-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:M H NiuFull Text:PDF
GTID:2428330578450930Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As people enter the information age,people are also unknowingly entering the era of big data.Data plays an increasingly important role in people's life and work.However,in the process of data collection,due to noise,data collection failure and other reasons,the data is often missing,resulting in incomplete data sets.Traditional expectations maximization,weighted estimation equations,K-nearest neighbors and other methods can not meet the needs of current clustering accuracy.How to improve the clustering accuracy of incomplete data has always been a hot issue of scholars at home and abroad.Firstly,this paper proposes an adaptive interval incomplete data fuzzy clustering algorithm(AI-IFCM)for the problem that fuzzy C-means algorithm(FCM)can't directly deal with incomplete data.The attribute correlation distance is calculated to calculate the distance between the data sample to be padded and other samples to determine the neighboring sample set of the missing data sample.The number of nearest neighbor samples is selected by the nearest neighbor rule,and the range of attribute values of the neighbor sample set is obtained as the missing data.The upper and lower limits of the attribute interval filling,the default median value of the interval is the median value of the nearest neighbor sample value of the missing attribute.To further reduce the error of the interval to the fuzzy clustering,the interval factor is adjusted to adjust the interval size,and the dispersion between the neighbor samples is calculated.The interval and the central value determine the interval factor,and the calculated interval data set is substituted into the interval fuzzy C-means(IFCM)for cluster analysis.Secondly,in view of the problem that the sample outliers affect the accuracy of fuzzy clustering,this paper proposes an interval sample weighted fuzzy clustering algorithm(AI-WIFCM)for incomplete data,in order to reflect the contribution of samples to the cluster center.Sample weights are added during the iterative process of the algorithm.This paper first finds the limitations of traditional sample weights,and proposes a new sample weight assignment method.Secondly,based on the adaptive interval data set,the calculation of sample weights is extended to interval data sets.The interval fuzzy C-means is improved,and the weight of interval data samples is introduced in the iterative process of the algorithm,which is beneficial to the selection of cluster centers and increases the accuracy of clustering.Finally,this paper uses the biological dataset Iris Iris in the UCI database,the medical dataset breast cancer Breast and the medical dataset adult liver disease Bupa,and the artificially generated regular dataset ONE and the irregular dataset TWO.Experiments with WDS-FCM,PDS-FCM,OCS-FCM and other algorithms were carried out under four kinds of missing rates,and the principle of the algorithm was analyzed and compared.The results show that the proposed algorithm has higher clustering accuracy.
Keywords/Search Tags:Incomplete data, interval fuzzy C-means, adaptive interval, sample weighting
PDF Full Text Request
Related items