Font Size: a A A

Research On Fuzzy Clustering Algorithm For Categorical Data

Posted on:2019-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:S J WangFull Text:PDF
GTID:2428330566480052Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology in the industry,the scale of data and people's access to data are also increasing.The handling of these massive data has become a topic of concern in recent years.As a general knowledge discovery technology,data mining is the process of discovering the relationship between model and data in a large amount of data.Cluster analysis technology is an important method for data processing in data mining.As data continuously presents the diversity of attribute types,the massiveness of the scale,and the heterogeneity of distribution,different data features require different clustering algorithms.At present,the cluster analysis of numerical data has achieved significant results,but there are a large number of classification data in practical applications.Because the classification data does not have the inherent geometric characteristics of numerical data,it is different from the numerical data in the clustering algorithm and model.In recent years,the research on the clustering algorithm for the classification data has attracted widespread attention.Fuzzy clustering applies the fuzzy set theory to cluster analysis to improve the ability of data processing,and can clearly and objectively reflect reality.Therefore,it is widely used in many fields.FKM(Fuzzy k-modes)algorithm is an important algorithm for classification data in fuzzy clustering.It has the characteristics of strong local search ability and fast convergence rate,and has become a hot topic in the research of fuzzy clustering algorithms for classified data.However,the FKM algorithm is sensitive to the selection of the initial center point,which leads to different results when the algorithm performs clustering with different initial centers,and affects the final clustering result.At the same time,FKM algorithm is easy to fall into local optimum because of its strong local search ability.In response to the above issues,the following work has been carried out in this article:(1)An initial center selection algorithm combining outlier detection is proposed.For the problem that the FKM algorithm is sensitive to the initial center selection,by adjusting the relationship between the calculated distance and the density of the initial center,the weight coefficient occupied by the distance in the formula is increased,so that the found initial center is more distributed.At the same time,the distance-based outlier detection technology was introduced to filter candidate data sets obtained after the initial center selection was improved,and the points with larger outliers in the candidate data set were eliminated.Experimental results show that the improved initial center selection method can improve the accuracy and precision of FKM algorithm,and reduce the sensitivity of FKM algorithm to initial center selection.(2)An improved genetic algorithm-based fuzzy clustering algorithm(IGAFKM)is proposed.Combining the genetic algorithm with the fuzzy clustering algorithm,the random search of the genetic algorithm can improve the global optimization ability of the fuzzy clustering algorithm and accelerate the convergence speed of the algorithm.Genetic algorithm is a kind of global optimization algorithm.It searches the optimal solution by simulating the natural evolution process.It has the characteristics of simple solution process and wide search range.The crossover and mutation probability of genetic algorithm are dynamically adjusted to enhance the diversity of the population,to avoid the algorithm falling into a local optimum,and to accelerate the convergence process of the algorithm to the global optimum,thereby improving the global optimization ability of the FKM algorithm.Experimental results show that the improved genetic algorithm based on fuzzy clustering algorithm(IGAFKM)and FKM algorithm compared with GAFKM algorithm in the convergence speed has improved,while the accuracy and accuracy of IGAFKM algorithm is also better than FKM algorithm and GAFKM algorithm.
Keywords/Search Tags:Categorical data, Fuzzy clustering, Genetic algorithm, Initial center selection, dynamic adjustment
PDF Full Text Request
Related items