Font Size: a A A

The Application And Research On Fuzzy Multiple Criteria Decision Making In Clustering Analysis Of Big Data

Posted on:2020-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:X M LiFull Text:PDF
GTID:2428330575988000Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering analysis of big data is important part of big data processing.Clustering analysis is to divide data elements with similar attributes into different ones.The effect of the same clustering algorithm on different data sets may be different,and some even so greatly.At present,there are many studies on the improvement or parallelization of clustering algorithms,but the research on clustering algorithm selection or k-value selection for a certain data set is relatively rare.This paper mainly studies the optimal clustering algorithm and the optimal number of clusters for a certain data set.The data set can be a common data set or a large data set.Regarding the algorithm selection problem of common datasets,some classical clustering algorithms need to be comprehensively compared and analyzed,including clustering algorithms based on division,hierarchy,density and model,etc.,and then multiple clustering external evaluation indicators are selected to verify the quality of the algorithm.These evaluation data are composed into a decision matrix,and the matrix is processed by a multi-criteria decision-making model.For the k-value selection of the clustering algorithm,this paper select some internal validity evaluation indicators,then different k-values are selected to form the decision matrix,and the multi-criteria decision-making model is used to process the matrix.To construct a fuzzy multi-criteria decision-making model,first a variety of different weight determination methods need to be determined,including subjective,objective and comprehensive weight determination methods;in addition,there are linguistic term sets and intuitionistic fuzzy sets in fuzzy numbers,which are used to indicate uncertain information;language term sets can be given different evaluation values for the specific characteristics of the clustering algorithm,and constitute subjective weights.There are also more popular multi-criteria decision-making methods,such as generalized MSM operators and weighted MSM operators,and finally the appropriate weight determination method and multi-criteria decision-making method would be chosen.K-means is a classic and commonly used method in clustering algorithms.This paper improves it so that the algorithm have better clustering effect.Because the amount of data used is very large,this article will parallelize the improved k-means algorithm,and use the spark platform to process data,and select several different k-meansclustering methods,parallelize them on the spark platform.Compare them with the improved algorithm,and the comparison method still use multi-criteria decision-making model to prove the superiority of the improved algorithm.
Keywords/Search Tags:Clustering algorithm, cluster validity evaluation, multi-criteria decision making, fuzzy number, k-means algorithm
PDF Full Text Request
Related items