Font Size: a A A

The Research And Application Of Improved Data Competition Clustering Algorithm

Posted on:2019-06-28Degree:MasterType:Thesis
Country:ChinaCandidate:J N XuFull Text:PDF
GTID:2428330548481385Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data analysis technology is widely used in various fields.Among them,cluster analysis as an important research direction has achieved great development.The data competition(DC)algorithm is a partition-based clustering algorithm that can eliminate the interference caused by isolated points and has a stable clustering effect.It has been applied in image segmentation and text clustering,but there are still many deficiencies and further research is needed.The DC algorithm artificially sets the number of clusters,and the clustering effect on processing stream,ring,and complex structure data sets is not ideal.Aiming at the above problems,the paper compares the advantages and disadvantages of some clustering algorithms proposed in recent years in dealing with different clustering problems,such as density peak clustering and spectral clustering.It is inspired by the above clustering ideas and proposes DC algorithm.Related improvements are applied to color image segmentation.The main work of the dissertation is as follows:1.Aiming at the fact that the DC algorithm artificially sets the number of clusters and the defect that the clustering center can't automatically determine,an automatically determined data competition algorithm(ADDC)is proposed.The data field model is used to replace the aggregated field model,and the distribution characteristics of the data set are described by the level of the data point potential;after the competition,a histogram is constructed based on the gamma distribution of the data points to automatically determine the critical threshold,and the quasi-clustering is selected.The center point;in which,the attenuation center of the data field is used to select the actual cluster center point to complete the automatic determination of the cluster center.The experimental results on the artificial and UCI data sets show that the ADDC algorithm not only has the ability to automatically determine the cluster center,but also improves the clustering effect.2.Aiming at the problem that the DC algorithm can not effectively describe the similarity between the data points of the stream,ring,and multi-scale distribution,a data competition algorithm based on geodesic distance and density adjustment(GDDA-DC)is proposed.The algorithm improves the adaptive similarity function.A similarity function based on geodesic distance and density adjustment is designed to accurately describe the distribution characteristics of the data space.After the cluster center is competed,the analog density peak clustering is performed.The way to assign data points to complete the clustering.Contrasting experiments with multiple algorithms show that the GDDA-DC algorithm has higher clustering accuracy,especially when dealing with multiple scales,rings,and manifold datasets.3.Aiming at the defect that DC algorithm can't effectively cluster complex datasets,a data competition algorithm based on dense coefficients and local similarity(DCLS-DC)is proposed.Among them,a dense similarity formula is designed.The dense coefficient of each data point is calculated by this formula to weight the scaling parameters of the adaptive similarity function,which reduces the similarity between sparse region data points;subsequently,the internal data is calculated.The local similarity index of the points is used tofurther mine the structure information of the clusters;after competing for the cluster centers,the analog density peak clusters the way of allocating data points to complete the clustering.Through theoretical analysis and comparison experiments of various algorithms,the ability of DCLS-DC algorithm to process complex structural data sets is proved,and the clustering effect is stable.4.The GDDA-DC and DCLS-DC algorithms are applied to image segmentation.The SLIC superpixel algorithm is used to pre-segment the image,and multiple algorithms are used to perform the secondary segmentation and comparison.Experiments on Berkeley image datasets and real shot images show that the improved data competition algorithm has better segmentation effect than other algorithms.
Keywords/Search Tags:DC clustering algorithm, data field, automatic clustering, similarity matrix, geodetic distance, image segmentation
PDF Full Text Request
Related items