Font Size: a A A

Research On Probability Set-valued Data Clustering Method

Posted on:2022-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhuFull Text:PDF
GTID:2518306521981679Subject:Economic big data analysis
Abstract/Summary:PDF Full Text Request
In the era of big data,the scale of data that people can obtain has exploded,the types of data are diversified,and the structure of data is becoming more complicated.Probability set-valued data,as a typical complex data,portrays setvalued samples with probability distribution information.How to explore and mine probabilistic set-valued data is an important research problem in the current big data field.Since it is often difficult to obtain labeled samples in practice,how to use clustering methods to dig out useful knowledge for unlabeled probability set-value data,and then guide our decision-making is very practical.Traditional clustering algorithms are often based on distance measures.However,these distance measures are for a single type of data and cannot be directly applied to probability set-valued data that has both probability and setvalue features.In order to solve this problem,this paper proposes a new distance measurement method,combined with clustering fuzzy technology,and constructs a clustering algorithm for probability set-valued data.The main research results are as follows:1)A new distance measurement method is constructed for probability set value data.Because probability set-valued data has both probabilistic features and set-valued features.How to effectively measure the distance between these two features is an important issue studied in this article.In this article,we take the probability feature and set value feature into consideration together,and use f divergence to describe the difference in probability features.At the same time,the difference of the set under the classification feature is used to jointly determine the distance between the two probability set value objects,which breaks the singleness of the traditional data set for the distance measurement.2)Based on the new distance measurement principle,a new clustering algorithm for probabilistic set-valued data: Fuzzy Probability Set-Valued Clustering Algorithm(FPSC)is proposed.In FPSC,we have optimized and improved on the basis of the traditional fuzzy clustering algorithm.Although the traditional fuzzy algorithm can make the object be assigned to one or more clusters,which solves the limitations of hard clustering in the actual scene,the algorithm for the selection of cluster centers is still unclear.In order to better capture the cluster centers,in this article we introduce the prototype weight of the sample.If the prototype weight of the sample is not zero,then each object in the data set can represent the cluster center to a certain extent.The greater the weight of the prototype,the more representative the sample is the cluster center.This approach can not only get rid of the limitation of a single object as a clustering center,but compared with the existing single-type data clustering method,FPSC can also capture the basic structure of the clustering center more accurately.In response to the above improvements,we proposed a new objective function,and gave an alternate optimization algorithm to solve the objective function,and we carried out a complexity analysis and a theoretical proof of the convergence of the FPSC algorithm.3)In the experimental part,we selected six real UCI data sets.The experimental results prove that the FPSC algorithm we proposed is better than other traditional clustering algorithms in accuracy,purity and FMI three clustering indicators.In addition,we also conducted a parameter sensitivity analysis.According to the performance of the six data sets in the clustering index,it is concluded that FPSC is not sensitive to the parameters,that is,the parameters are robust,which simplifies the consideration of parameters such as the range of values.
Keywords/Search Tags:probabilistic set-valued data, distance measurement, fuzzy probability set-valued clustering algorithm, fuzzy technology
PDF Full Text Request
Related items