Font Size: a A A

Fuzzy C-means Clustering Algorithm Based On Mahalanobis Distance For Compositional Data

Posted on:2020-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2428330623464662Subject:Application probability statistics
Abstract/Summary:PDF Full Text Request
Compositional data is a ubiquitous type of data that only contains relative information and represents the relative contribution of each variable in a whole.Using composition data to describe things can reflect the structural characteristics of things.Due to the elements of compositional data are subject to the non-negative and constant-sum constraints,if the analysis is directly based on the ordinary data of the Euclidean space,the result will be seriously deviated.Therefore,the analysis of the compositional data is subject to the basic principles of compositional data.The similarity measure is an important issue in statistical research,and the Mahalanobis distance is an important indicator of similarity measure in statistical analysis.Whether the Mahalanobis distance indicator is suitable for the similarity measure of the compositional data is a problem worthy of study for compositional data.Fuzzy clustering analysis is one of the statistical methods closely related to the similarity measure.The choice of similarity measure directly affects the effect of fuzzy clustering.Fuzzy C-means clustering algorithm is one of the widely used fuzzy clustering methods.Therefore the paper constructs a fuzzy C-means clustering algorithm based on Mahalanobis distance for compositional data,which has strong theoretical value and application significance.The paper first rigorously proves that the Mahalanobis distance which based on three different log-ratio transformations is consistent with compositional data similarity measure criteria.And secondly proposes a fuzzy C-means based on Mahalanobis diatance for compositional data(FCM-ML),verifying the validity of the FCM-ML algorithm by numerical simulation and empirical analysis.Then applying the FCM-ML algorithm to the imputation of missing value for compositional data,proposing the imputation method based on Mahalanobis distance of compositional data,which is a FCM-ML imputation method,and through the numerical simulation and empirical analysis to get the effect of FCM-ML imputation method.The research of the thesis not only enriches the fuzzy C-means clustering algorithm of compositional data,but also enriches the imputation method of compositional data.The research content of the paper specifically includes:1.In Palarea-Albaladejo et al.(2012),the author proposes that the indicator of similarity measure should satisfy three basic criteria: scale invariance,disturbance invariance and subcompositional advantages.According to these three basic criteria,it is tested whether the Mahalanobis distance which after three different logratio transformations satisfies the basic criteria of the similarity measure for compositional data.2.The paper applies the Mahalanobis distance of the compositional data to the Fuzzy C-means clustering algorithm.For the complete compositional dataset,the paper proposes FCM-ML algorithm,and through the numerical simulation and empirical analysis,compared with the fuzzy C-means clustering algorithm of the Mahalanobis distance(FCM-ML(crude))that treats compositional data as ordinary European data and the fuzzy C-means clustering algorithm(FCM-Aitchison)based on Aitchison distance which proposed by Palanea-Albaladejo et al.(2012).Finally,we can get the validity and scope of the FCM-ML algorithm.3.The paper applies the FCM-ML algorithm to the missing value of compositional data.For compositional datasets with missing values,the paper proposes a compositional data imputation method based on FCM-ML algorithm.Through numerical simulation and empirical analysis,compared with the K-Nearest Neighbor imputation method(KNN)and the iterative regression imputation method(LISR)(Hron 2010),the effect of FCM-ML imputation method can be obtained.Research indicates:1.Due to the equivalence,the Mahalanobis distance of compositional data which through three different logratio transformations can be unified into the Mahalanobis distance of compositional data,which satisfied the three basic criteria proposed by Palarea-Albaladejo et al.(2012),and can be used as the indicator of similarity measure for compositional data.2.In the comparison of fuzzy C-means clustering algorithms for complete compositional datasets,because the FCM-ML(crude)algorithm ignores the characteristics of compositional data,the compositional data is regarded as ordinary European data,so its clustering effect is not as good as the FCM-ML algorithm proposed in this paper.Because the FCM-ML algorithm considers the compositional correlation at the same time,When there is a certain correlation between compositions,the FCM-ML algorithm is superior to the FCM-Aitchison algorithm,and vice versa.The FCM-Aitchison algorithm and the FCM-ML algorithm each have their applicability.3.In the comparison of the imputation methods for compositional data: when the missing rate is constant,with the increase of the correlation of compositions,the effect of the FCM-ML imputation is higher and higher,and the effect is better than KNN and LISR;under the condition of certain correlation of compositions,with the increase of the missing rate,the effect of FCM-ML imputation is gradually reduced.And the effect of the corresponding FCM-ML imputation has better than KNN and LISR.There are two possible innovations in the thesis: one is based on the special data type of compositional data,and explores whether Mahalanobis distance can be used as an indicator of the similarity measure of compositional data.Studies have shown that for compositional data,the Mahalanobis distance transformed by log-ratio satisfied the indicator requirements of similarity measure for compositional data.The second is to propose FCM-ML algorithm,and apply the algorithm to the imputation of missing values of compositional data.Through numerical simulation and empirical analysis to explore the effectiveness and scope of the proposed method,it has certain advantages.
Keywords/Search Tags:Compositional data, Logratio transformation, Mahalanobis distance, FCM-ML algorithm, FCM-ML imputation method
PDF Full Text Request
Related items