Font Size: a A A

Theoretical And Applied Research On Fuzzy C-means Clusteirng And Its Cluster Validation

Posted on:2015-11-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:K L ZhouFull Text:PDF
GTID:1228330467487002Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Clustering is the important research content of data analytics, knowledge discovery, intelligent decision-making and many other areas. Fuzzy c-means (FCM) algorithm is one of the most extensively applied fuzzy clustering methods. Different from crisp clustering methods like K-means and hierarchical clustering, the concepts of membership degree and fuzzifer were introduced into FCM clustering, which makes it be used more widely. However, there are many defects for traditional FCM algorithm. For example, it is difficult to determine the optimal number of clusters, the clustering partition can be easily affected by the data distribution, the value of fuzzifier can significantly influence the clustering results, and it is easy to fall into local minima, etc. Particularly, in the Big Data environment, the volume of data is increasing dramatically, and the data structure is becoming more complex. All of these factors have brought great challenges for the application of traditional FCM clustering. Therefore, the theoretical and applied research on FCM clustering and its cluster validation are of great significance for the performance improvement of FCM clustering, enriching related theories of clustering, and promoting the wide application of fuzzy clustering.Based on the summary and review of related studies on FCM clustering and its cluster validation, combining theoretical analysis, data experiments and applied research, this dissertation studies FCM clustering and its cluster validation methods, as well as their applications in electric power load data classification. The main research contents and innovations of this dissertation are as follows:(1) This dissertation proposes a weighted summation type fuzzy cluste validity index (CVI). First, the well-known fuzzy CVIs and their deficiencies are systematically summarized, analyzed and reviewed. Based on this, this dissertation constructs a weighted summation type CVI (WSCVI) for FCM clustering. Experimental results demonstrate that, by properly setting the weights of each CVI, the WSCVI addressed the deficiencies of traditional CVIs for fuzzy clustering. The WSCVI can better identify the optimal number of clusters for given data set, which provides a new direction for the study on cluster validation of FCM clustering.(2) This dissertation constructs a novel CVI called COS for FCM clustering, considering the differences of cluster sizes and densities in a given data set. To effectively solve the fuzzy cluster validiation problem for data sets with clusters of great differences in sizes and densities, a new CVI called COS is proposed. There are three components in COS index, namely compactness measure, overlapping measure and separation measure. The compactness of intra-clusters is expressed by the ratio of the sum of memhershin deorees in certain threshold and the max distance of intra-clusters. The difference of membership degrees in certain threshold of a certain point to two clusters indicates the overlapping degree of the two clusters. The measurement of separttion of inter clusters is the minimum distance between clusters.The optimal cluster number is determined by the maximum value of COS index. Experimental results show that the COS index can discover the small size and low density clusters effectively. This provides a theoretical support for the fuzzy cluster validation problem of data sets with clusters of great differences in sizes and densities.(3) This dissertation investigates the uniform effect of FCM clustering. Theoretically, the composition of the objective function of FCM is analyzed, and the three factors, ninj,||vi-vj||2and μm,which may influence the clustering result of FCM are identified. On this basis, a cluster validity criterion and a fuzzifier selection criterion as well as the selection algorithm are proposed from the data distribution perspective. The extensive experiments on both synthetic and real-world data sets further demonstrate the uniform effect of FCM clustering, and the influence of fuzzifier on this kind of uniform effect. This provides a guideline to understand the clustering partition of FCM, to improve the application performance of FCM clustering and to select appropriate value of fuzzifier in FCM.(4) This dissertation proposes a fuzzifier selection method from the cluster validity perspective, and the proposed method is used for load profiles grouping. Based on the idea of selecting optimal number of clusters using CVIs, the fuzzifier selection method from cluster validation perspective is proposed. And the specific steps of fuzzifier selection based on cluster validation are presented. The experimental results show that the widely used fuzzifier values of FCM custering in existing studies are not always optimal. The application of this method in load profiles grouping in the smart grid environment improves the practicality of FCM clustering.(5) This dissertation explores the applications of intelligent algorithms optimized FCM clustering in electric power load characteristics classification and demand side management (DSM). To solve the problem of FCM clustering that it is easy to fall into local optima, this study points out that the simulated annealing and genetic algorithm optimization can effectively improve the global search capability of FCM clustering. The simulated annealing and genetic algorithm optimized FCM (SAGA-FCM) algorithm is used for load characteristics classification and load profiles grouping. The accuracy and effectiveness of load classification are improved, which provides more effective support for the implementation of DSM programs and other decision makings in power system.
Keywords/Search Tags:Fuzzy C-means (FCM), Cluster validation, Cluster validity index (CVT), Fuzzifier, Data distribution, Intelligent algorithm optimization, Load data clustering
PDF Full Text Request
Related items