Font Size: a A A

Study On Fuzzy Clustering For Incomplete Data Based On Probability Model Of Missing Attribute Values

Posted on:2018-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:G X LiuFull Text:PDF
GTID:2348330536961576Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Fuzzy clustering has been widely used in image processing,pattern recognition and other fields.Traditional clustering analysis methods can only be applied to complete data sets and can not be applied directly to incomplete data sets.However,in practical applications,the data are usually incomplete due to various reasons,and the processing of missing attributes has a significant effect on the clustering effect.Therefore,it is a practical problem to study the clustering method of incomplete data set.Neighbor interval can take into account the uncertainty of the missing attribute value,but it does not fully exploit the attribute value of the neighboring sample,and can not reflect the attribute value distribution information of the neighbor sample.Based on the description of the neighborhood of the missing attribute value,The probability attribute is the distribution information in the range of the nearest neighbor,and a simple and effective probability model(PM)is established for the missing attribute value.Genetic algorithm is used to achieve clustering by genetic algorithm and gradient descent method.The genetic algorithm performs the initial population and mutation operation by the probability value.The gradient descent method determines the search step by the probability of missing attribute value.The algorithm searches for the missing attribute estimates based on the probabilistic search of the missing attribute estimates in the corresponding nearest neighbor interval to minimize the clustering objective function.FCM clustering can be achieved by reducing the data set based on the optimized missing attribute estimates.The incomplete data fuzzy poly class problem.In this paper,the missing attribute value probability model can not only introduce the nearest neighbor information into the missing attribute description,but also fully exploit the distribution information of the corresponding attribute value in the nearest neighbor range,so it can effectively "reduce" the missing attribute value.Genetic algorithm has a fine global search ability,and the stability is better;and gradient descent method has the ability to quickly search,can quickly search for a better solution,you can get a good clustering results.Simulation experiments on multiple UCI data sets show that the probability model is an effective method to describe the missing attribute values of the incomplete data,and the result of clustering is stable.
Keywords/Search Tags:Probability Model, Fuzzy c-means, Genetic Algorithm, Gradient Descent
PDF Full Text Request
Related items