Font Size: a A A

An Improved Fast Density-based Clustering Algorithm For Mixed Data

Posted on:2019-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:H LiangFull Text:PDF
GTID:2348330569989335Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
The fast density-based algorithm,proposed by Rodriguez and Laio in 2014(Al-gorithm RL),is widely used because of its superiority that the clusters are recognized regardless of their shape and that the number of clusters is determined intuitively.Considering datasets mixed by continuous and discrete variables,the distance mea-sure between the two data points is more complicated,few researchers have devote fast clustering algorithms to mixed data.Meanwhile most of the datasets in real life are mixed,so we propose an improved fast density-based clustering algorith-m for mixed data(Algorithm 2),this method is an improved implementation of"Clustering by fast search and find of density peaks"(Algorithm RL)to mixed data.Algorithm 2 defines the distance metric of the mixed datasets and selects the possi-ble cluster centers with the self-selection(algorithm 1),then each remaining points is assigned to the same cluster as its nearest neighbor of higher density.Because the complexity and time of the distance measure will increase at the square speed when the amount of data is large,in order to achieve the purpose of reducing the computational complexity and time,Algorithm 2 is proposed and explored,where the sliding window model is utilized,to the large mixed datasets clustering.The effectiveness of the algorithm is verified by sets of UCI data.
Keywords/Search Tags:mixed data, fast density-based clustering, big data, window model
PDF Full Text Request
Related items