Font Size: a A A

Research On Incomplete Data FCM Clustering And Outlier Detection

Posted on:2019-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:R X XuFull Text:PDF
GTID:2428330548994842Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
In the wake of developments in science and technology,more and more data is generated from our life and work.Analyzing and processing large amounts of data to obtain useful models and forecasting unknown data have become a hot topic of concern.Fuzzy c-means clustering(FCM)is a classical clustering method.Although FCM and improved methods have been widely applied in many fields,there are still many deficiencies in dealing with the incomplete data clustering problem and outlier detection of class imbalanced data.The existing incomplete data clustering methods are mainly two kinds.One is to delete data objects with missing values directly before clustering.This method reduces the amount of data,but destroys the structure and integrity of the data.The other is to calculate the distance between data objects using partial distance.This distance considers only the differences between known attribute values.If one of the two data object's corresponding attribute values is unknown,the partial distance is equal to the distance between two identical data objects(no missing values).Obviously,this method cannot get accurate clustering results.On the basis of FCM clustering method,through constructing the neighborhood information model of incomplete data,an incomplete data clustering method is proposed.In order to solve the outlier detection problem of unbalanced data,a new outlier detection method based on clustering is proposed.The specific research contents are as follows.In most of the existing fuzzy c-means clustering method,they seldom concern the uncertainty of missing attributes.Consequently,a fuzzy c-means clustering method in incomplete data sets based on neighbor information,which is named NFCM method,is proposed.The proposed method is to construct an effective model of neighborhood information,and it will be combined in the objective function of optimized complete clustering method.Moreover,the missing value is taken as additional variable,and Lagrange multiplier method is used to solve it.Through three layers alternating iteration,the data are clustering while estimating the missing values.The proposed NFCM algorithm is compared with four incomplete data clustering algorithms for three UCI data sets,and experimental results demonstrate that the proposed methods cannot only estimate the missing values effectively,but also achieve better clustering performance on incomplete data.Aiming at the problem that outlier detection based on clustering cannot accurately detect the outliers in unbalanced data.This paper introduces the class information and class scatter into the neighborhood fuzzy c-means clustering to construct a kind of clustering method to deal with imbalanced data.This method not only considers the uneven distribution of data objects within a class,but also effectively solves the problem of class imbalanced data clustering.The data object and its neighbor point belong to the local outlier detection method,and an outlier detection method based on dispersion fuzzy c-means clustering is proposed,which is named OCWFCM.In this method,a local outlier score is given to each data object in the data set.According to the top-N principle,the former N data objects with the largest outlier score are taken as outliers.The OCWFCM method is compared with the commonly used outlier detection method.Simulation results show the effectiveness and advantages of the OCWFCM method.
Keywords/Search Tags:Fuzzy c-means clustering, Outlier detection, Incomplete data, Neighborhood information, Class dispersion
PDF Full Text Request
Related items