Font Size: a A A

Research And Application Of Clustering Algorithm Based On DBSCAN

Posted on:2017-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z H FengFull Text:PDF
GTID:2308330488482279Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development and wide spreading of mobile Internet and mobile devices on a global scale, more and more traditional industries have been facing unprecedented challenges. Especially in recent years when the era of big data is constantly deepened, various industries have paid more attention to the importance of data mining technology in information system construction. Cluster analysis as one of the important research directions in the field of data mining has been widely used in data analysis, image processing, machine learning, etc. Wherein density-based clustering algorithms do not entail a pre-specified number of clusters, and can identify clusters of any number and shape in datasets with noise,while Density-based spatial clustering of applications with noise(DBSCAN) as a classic representative of density-based algorithms has been more widely applied to cluster analysis.By studying the clustering algorithm DBSCAN, and focusing on the issues of unreasonable taxi distribution and inefficient scheduling, based on taxi characteristics of uneven distribution of passenger hot data and huge amount of data, this paper has presented an improved multi-density DBSCAN that can help with detecting hotspots of taxis passengers.In combination with mapping service in mobile platforms, a taxi-passenger hotspot detection system based on mobile terminal has been designed and implemented for guiding taxi distribution and scheduling. The main research work is as follows:Firstly, since the DBSCAN algorithm can not handle multi-density datasets clustering problems and sensitive to input parameters, this paper proposed an improved DBSCAN algorithm based on greedy strategy, namely Greedy DBSCAN. Especially, the proposed algorithm only needs to input one parameter MinPts, and it uses greedy strategy to find the parameter Eps adaptively. Moreover, the proposed algorithm uses the relative density to identify and determine the noise data. The proposed approach uses the neighborhood query in the process of random seeking core objects to improve the efficiency. And through the combination of the clusters to generate the final clustering results. The experiment results show that Greedy DBSCAN algorithm can separate the noise data effectively and obtain higher accuracy of identifying multi-density clusters.Secondly, in order to further enhance the clustering efficiency of Greedy DBSCAN algorithm in processing large-scale datasets, this paper proposes a Greedy DBSCAN algorithm grounded upon reservoir sampling. By summarizing the optimal sample size, the sampling rate is determined. The simulation results show that the algorithm can be adapted to large-scale, irregularly shaped multi-density clusters. In addition, by WEKA data mining tools,taxis original passenger hot data are preprocessed and active hot data are extracted. The Greedy DBSCAN algorithm grounded upon reservoir sampling is employed to experiment raw GPS data for five days of 12,000 taxis in Beijing in order to verify the effectiveness of the algorithm proposed in discovering and predicting taxi-passenger hotspot applications.Finally, the system design uses MVC layered architecture model and the current prevalent REST-style architecture resource design templates, as well as Spring MVC + Spring+ Hibernate lightweight development framework to realize the server portion of the system.Bootstrap responsive front-end framework is adopted for implementing the system Web front-end; in the practical application of the system, for distribution characteristics of taxi-passenger hot data at different periods on weekdays and holidays, this paper appropriately adjusts MinPts parameters to achieve fine-grained clustering. The clustering results are presented as markers on the mobile terminal map to lead taxi driver to take passengers. Finally the system test has been passed, thus verifying the validity of the algorithms proposed in the paper and technologies in taxi-passenger hotspot detection applications.
Keywords/Search Tags:multi-density clustering, greedy DBSCAN, sampling reservoir, taxi pick-up hotspots
PDF Full Text Request
Related items