Font Size: a A A

Research And Application Of Clustering Algorithm Based On Maximum Entropy

Posted on:2021-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:L H XiangFull Text:PDF
GTID:2518306470461224Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Today,clustering analysis has become an important analytical tool in the field of data mining.Clustering analysis based on the classification of machine learning is unsupervised learning.In the continuous development process based on real problems,fuzzy theory can effectively describe the uncertainty between samples,and semisupervised learning methods can use a priori knowledge.These two theories have attracted extensive attention from scholars in the field of cluster analysis.Based on the previous theory,a semi-supervised cluster analysis algorithm based on fuzzy theory is proposed by extending the maximum entropy clustering algorithm.The central idea of the mathematical model is to transform the cluster analysis problem into a constrained mathematics Problem.Using the optimization solution process to determine the fuzzy division and clustering of the sample data set.Such research has achieved important research results,but it is still worth exploring how to effectively use the objective function based on pairwise constraints to process clustered samples for the specific situation of different data sets.In this paper,based on the semi-supervised clustering algorithm(PD-s SC)based on power divergence and paired constraints,the clustering effect of the PD-s SC algorithm is investigated in the less than ideal situation when the number of paired constraints is large.In order to solve the above problems,this article has done the following work based on relevant research theories.1.To analyze the problem of the large number of paired constraints,the closure criterion is introduced to pack the must-link constraint classes in the paired constraints,reducing the original data paired constraint capacity and using the center point of each package to replace the entire closure structure for reconstruction the new cannon-link constraint.So that the algorithm not generate the must-link constraint violation problem during the clustering process.It should be noted that the new sample has the updated cannon-link constraint but there is no must-link constraint.Finally,based on the PD-s SC algorithm,a new objective function is constructed to obtain the CCPC algorithm,and then the new samples are clustered.2.In the process of programming the CCPC algorithm when normalizing the membership vector.In order to solve the arithmetic overflow problem,the censored exponential function is added to intervene in the control process of the algorithm process which reduces the difficulty of programming.The effect of its replacement on the clustering effect of the CCPC algorithm is negligible after experimental testing.3.The experimental study is carried out by selecting UCI standard data set and collecting realistic feature data set.Using closure criteria and three cluster evaluation indicators.Design the corresponding algorithm experiment process for experimental comparison between PD-s SC algorithm,CE-s SC algorithm,PCBKM algorithm and SSC algorithm.The experimental results show that the CCPC algorithm performs well regardless of whether the number of pairwise constraints is large or small.
Keywords/Search Tags:Semi-supervised clustering, Fuzzy theory, Power-divergence, Pairwise Constraints, Closure Criterion
PDF Full Text Request
Related items