Font Size: a A A

Research On Grid Density Peak Clustering Algorithm And Urban Hot Spots Extraction

Posted on:2020-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z N NaFull Text:PDF
GTID:2428330599964246Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of big data,a huge amount of data has been generated in various fields,so it is really important to analyze and mine the hidden information inside the data and to help the decision-making.Data mining is an important method to analyze the hidden information of massive data.It can automatically discover patterns and predict likely results.Clustering is an important data mining method.It has important applications in image processing,pattern recognition and knowledge discovery.However,the rapid growth of data volume has high requirements on the accuracy and running time of the clustering algorithms.Therefore,the clustering algorithms need to be continuously improved to meet the needs of large-scale data volume.The density-based clustering algorithm has the strengths of being insensitive to abnormal points and can cluster non-spherical clusters.In 2014,Alex et al.proposed a density peak clustering algorithm(DPC),which is a new density-based clustering algorithm that only needs to calculate two parameters: the local density and the high density distance.The clustering process is simple and there is no need to specify cluster centers in advance,so it has been widely used.However,the algorithm needs to calculate the parameters between all data points during the clustering process.Therefore,the time complexity and space complexity of the algorithm are high,and it is difficult to apply to large-scale data sets.So this paper introduces the idea of grid dividing and K-nearest neighbor method,then proposing two improved density clustering algorithms: Grid Density Peak Clustering Algorithm(GRID_DPC)and Grid K-nearest Neighbor Density Peak Clustering Algorithm(GRID_KNN_DPC).By dividing the data space into equal grid units and selecting the grid representative points,all the operations are based on the grid representative points.Usually the number of grid representative points is much smaller than the number of data points,so the new algorithms greatly improve the computational efficiency compared with DPC algorithm.And the time complexity and space complexity of the original DPC algorithm are reduced.Synthetic data sets are used to verify the effectiveness of the proposed new algorithms.By comparing the new algorithms with the original DPC algorithm,Affinity Propagation algorithm,K-centers algorithm and other classical clustering algorithms,the superiority of the new algorithms when clustering large-scale data sets are verified.The improved algorithms are also applied to the New York taxi data sets for extracting and analyzing the urban hotspots.Urban hotspots have always been regarded as an important means to study the human mobility.The GPS data of taxis has the characteristics of high application value,easy access,and large amount of data.Therefore,taxi data sets are used for extracting urban hotspots and finding human mobility,and the results can provide appropriate guidance for people to travel and improve urban resource utilization.This paper uses the new clustering algorithms to find and compare hotspots on weekdays and weekends.We also analyze the hotspots during holidays and observe the connection between clusters and inside clusters.The results can be used to help taxi drivers make better route planning and solve problems such as the urban traffic congestion.
Keywords/Search Tags:Data Mining, Density Peak Clustering, Grid Division, Hot Spots Extraction
PDF Full Text Request
Related items