Font Size: a A A

Research On Density Clustering Algorithm Based On Reference Points

Posted on:2019-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:Q L LiuFull Text:PDF
GTID:2428330548451868Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Data mining can find the potential from the size of the data,valuable knowledge.It brings a lot of data people accumulate in information age under the new meaning.With the technology of data mining,clustering as an important part of it,has been widely used in the data analysis,image processing,machine learning and other fields in which,the density based clustering method in cluster analysis technology plays an important role in finance,marketing,information retrieval,information filtering,widely used in all fields of science and engineering exploration,is the focus of research in clustering analysis.As the classical representative of density-based clustering algorithm,DBSCAN does not need to specify the number of clusters in advance,and can recognize arbitrary number and shape clustering in noisy data set.However,in the process of its DBSCAN algorithm,The complexity of the algorithm is very high and it needs a lot of time to consume.The two parameters of global Eps and Min Pts are used in the process of the algorithm,which depend heavily on the two parameters.The variation of the parameters has a great influence on the clustering results;moreover,for the data sets with uneven density,The clustering result of this algorithm is poor.In this paper,through the basic research of DBSCAN algorithm,the needle For the shortcomings of its algorithm,the following research has been done:1.To solve the problem of high time complexity of density-based DBSCAN algorithm,a fast density clustering algorithm based on reference points is proposed.The new algorithm uses k reference points to reflect the distribution of data.The algorithm maintains the advantages of DBSCAN,reduces the number of regional queries and reduces the cost of I / O.The theoretical and experimental results show that the new algorithm can effectively cluster large scale databases.And its execution efficiency is obviously higher than the traditional DBSCAN algorithm based on R* tree.2.aiming at the problem that density-based DBSCAN algorithm is sensitive to input parameters and can not cluster data sets at density level,A density clustering algorithm based on k-nearest neighbor and reference point is proposed.By querying the k-nearest neighbor of each point in the data sets to find the clustering.Firstly,the clustering is based on the center of the region(the point with the highest density of the region),and the parameters of the degree of departure and density are introduced to reach the edge of the region.In order to improve the accuracy of density clustering,in the process of cluster formation,reference points were found from candidate reference points,and the selection of reference points was increased.The experimental results show that the algorithm can not only find the clustering of arbitrary shape,size and density,but also reduce the sensitivity of the clustering to input parameters and enhance the clustering effect on the heterogeneous data sets.The accuracy of clustering quality is improved.
Keywords/Search Tags:Data mining, Density-based clustering, DBSCAN algorithm, Reference point, K nearest neighbor
PDF Full Text Request
Related items