Font Size: a A A

Non-uniform Data Clustering Method Based On Relative Density

Posted on:2022-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y WangFull Text:PDF
GTID:2518306602466014Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the progress of the times and the development of technology,the use of the Internet is more frequent,followed by the production of a large amount of data.If all these data generated in people's life can be used greatly,it can bring more convenience to people's life.The emergence of data mining is to solve this problem.By analyzing the structure of data and the correlation between data,useless information can be filtered and more valuable information can be extracted.As an important data mining method,clustering is also an unsupervised learning method,which can divide data into several clusters without any prior knowledge,so that the similarity between different clusters is as small as possible,while the similarity between the data within the same cluster is as large as possible.The density-based clustering method regards clusters as high-density regions separated by low-density regions and can identify clusters of any shape in the data set.The classical density-based clustering method can effectively identify the non-convex clusters and identify the noise,but the clustering effect is not good on the data sets with uneven density distribution.In order to be suitable for more complex cases,two new clustering methods are proposed in this thesis.Firstly,a clustering method based on relative density and mutual neighbor(RDMN)is proposed to solve the problem that low-density clusters cannot be identified from data sets with different densities.In this method,the local density of each data object is measured based on the relative density of k nearest neighbors.The fraction of one object's local density to the density of surrounding objects is taken as the relative density of the object,which can effectively identify the density peak points in low-density areas.In addition,the concept of mutual neighbor is used to define the neighborhood relationship between the data,which reduces the direct influence to the data point's local density of the parameter k.Finally,the remaining points lost in the process of cluster generation are found and allocated from the perspective of cluster.Secondly,considering that the representative point-based clustering method uses a data point to represent a cluster,if the representative point is wrong,the whole cluster will be misallocated.This thesis proposes the cluster core based on relative density as the representative of the cluster(CCS-BRD).These points with highly density are the local density peak points of the cluster,and these points in each cluster form the cluster's dense region,which is called the cluster core.As long as the cluster core clustering is correct,the result of the whole cluster is obviously not bad.When clustering cluster cores,since there is no influence of boundary points,the distance between cluster cores is larger than that between clusters,so it is easier to cluster data.After the cluster core of each cluster is obtained,the boundary points are allocated according to the distribution of cluster cores.In this thesis,two algorithms are tested on synthetic data sets and real data sets,the clustering results are evaluated from different perspectives by using different evaluation indexes and make comparison with other algorithms.The experimental results found that the two methods have super performance.In addition,the two algorithms also make comparison,and the performance of RDMN algorithm is better than that of CCS-BRD on data sets with uneven density,while CCS-BRD can better distinguish the boundary between cluster.
Keywords/Search Tags:Density-based clustering, Relative density, Mutual neighbor, Uneven density
PDF Full Text Request
Related items