Font Size: a A A

Research On Mining Taxi Pick-up Hotspots Area Based On Big Data Hadoop Platform

Posted on:2017-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z W WangFull Text:PDF
GTID:2308330482487127Subject:Traffic Information Engineering & Control
Abstract/Summary:PDF Full Text Request
With the development of national economy and the advancement of urbanization, the taxi has become one of the important ways of urban public transportation and its number is increasing. Because the taxis are installed with GPS terminals, these devices will send real-time status information to the taxi dispatching center timely, such as longitude and latitude vehicle information, speed, passenger status. With the accumulation of time, the dispatch center has collected and saved a large taxi data, and how to dig out the useful information from the taxi data has become a hot research area.Through processing and clustering mining of taxi data, we fully dig out taxi pick-up hotspots area so that we can provide auxiliary information and decision support for scheduling and management of taxi, improving the utilization rate of taxi. Traditionally, the taxi data processing and pick-up hotspots mining is based on a single computer. Limited to the configuration and performance of a single computer, the number of taxis and computing speed is limited. Big data Hadoop technology appears to solve the large amounts of data storage and computation bottleneck, so that a large number of taxis data processing and mining become possible.This paper studies the taxi pick-up hotspots area based on big data Hadoop platform. The main work is as follows:Firstly, we build a complete distribution pattern of Hadoop cluster experiment platform under laboratory conditions, including hardware and software environments deployment. The sorting and retrieval experiments were designed to test the performance of the cluster and single computer, and it verifies that the cluster is more suitable for the massive analysis and processing of large data, and the greater the amount of data, the more obvious advantages.Secondly, due to the taxi data contains a large number of abnormal data, and the data is disorganized, the data must be processed. In the light of 500G data generated by fourteen thousand taxis of Beijing City, this paper uses the Hadoop cluster platform for taxi data preprocessing. In the first place we upload the original taxi data to the Hadoop cluster, and then design processing program based on MapReduce computing framework to finish cleaning the data, secondary sorting and pick-up position information extraction operation.Thirdly, we study the K-Means clustering algorithm based on large data platform, and design an improved parallel K-Means clustering algorithm based on MapReduce computing framework. Through experiment analysis, we can verify that the designed algorithm has good parallel performance. Then we use this algorithm clusters the pick-up point in order to mining the taxi pick-up hotspots area. Finally, we use ArcGIS software realizing the visualization of taxi pick-up hotspots area and complete the analysis of the hotspots area combined with Beijing real map.
Keywords/Search Tags:Taxi, Large data Hadoop, Passenger hot spot, Parallel K-Means clustering
PDF Full Text Request
Related items