| With the rapid development of computer technology and the extensive application of Internet technology, the amount of data businesses are increasing rapidly. It has become an important issue in all businesses about how to analyze the vast amounts of data and converting it into useful and easily understood knowledge. The research of the issue has prompted the generation of data mining technology, which is used to get valuable acquisition and comprehensible knowledge from large scale data. Currently, data mining technology has been widely applied to various fields. Cluster analysis is an important part of data mining technology. The DBSCAN algorithm based on density in cluster analysis can dig out the clusters of arbitrary shape in spatial data with noise and has been widely used in Spatial Data Mining.The emergence of cloud computing technology solves the problem of the storage and computing of massive data in data mining. In order to achieve the storage and computing of large data sets, cloud computing technology distributes the storage and computing power to the cluster that composed of a plurality of storage and compute nodes. With the powerful storage capacity and computing ability that provided by cloud computing, data mining has entered a rapid development period which based on cloud platform.Taxis in the city is a reflection of the dynamic nature of the city, and with the rapid development of wireless communication technology, it became convenient to record the lane track of taxis. Most taxis have been equipped with a GPS terminal in this country, and a large amount of track data will be produced everyday. It has become a hot research about how to get useful information for passengers and taxi drivers from massive track data.First, this paper introduces the cloud computing technology, studies the HDFS distributed file system in Hadoop cloud platform and Map Reduce programming model. After the understanding of clustering algorithms and the focus on in-depth study of density based DBSCAN algorithm, the DBSCAN algorithm is parallelized processed according to the Map Reduce framework combining cloud computing technology. The system whichimplemented on the Hadoop, and the timeliness verification is performed on a own built Hadoop platform.Second, after the study and research of data mining system-related knowledge and reading taxi trajectory mining related literature, propose a data mining platform for taxi trajectory based on cloud platform. The data distribute storage mine by algorithms based on cloud platform.From the point view of taxi drivers, passengers and the government respectively, analysis of the taxi trajectory data mining could bring a variety of intelligent services for the modern city.Finally, through large-scale taxi trajectory data mining, taxi passengerpoint is recommended to provide service for taxi drivers.Last, through massive trajectory data for offline data mining and feature period of time to achieve sub-store, the taxi driver after a given location and the current time in need of services, its recommended hotspots around the passenger to help them quickly find the passenger point and try to maximize returns. Recommended applications through experiments which based on Shenzhen 13,798 taxi data carried out relevant arguments. Experimental results show the feasibility and practical rationality of recommended application design. |