Font Size: a A A

Research On The Data Mining Of Large Scale Taxi Tracks Based On The Relationship Between Supply And Demand Of Tax

Posted on:2017-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:J B LvFull Text:PDF
GTID:2348330488487697Subject:Cartography and Geographic Information System
Abstract/Summary:PDF Full Text Request
With the expansion of the scale of urban population, the accelerated pace of life, increasing the demand for taxi. And " difficult to take a taxi " is the city common fault, taking a taxi is harder(in the congested road, during rush hour, an important holiday, inclement weather,etc., etc.). Studies have shown that, "difficult to take a taxi" to a large extent is taxis are relatively insufficient supply, the taxi absolute supply meet the requirements of the urban scale on taxi fleet, but many taxis travel on the road without any passengers, which is called "empty traveling", and people take a taxi is difficult, this phenomenon is known as taxi supply and demand relative imbalances. "empty traveling" will not only increase the costs of the taxi drivers, reduce revenue, and will occupy the road resources, but also increase the environmental pollution and waste of energy issues. The problem of taxi relative supply is to solve the problem of taxi supply in a specific time and place, its essence is the information asymmetry of passengers and drivers, it is microscopic and dynamic. How to solve the relatively imbalances of taxi supply and demand, avoid as "empty traveling" lead to waste of resources, through effective way to reduce information asymmetry between the passengers and the cabbies, to improve the efficiency of take a taxi. This is a problem need to be solved urgently in the government. Now almost all the cabs are equipped with GPS positioning equipment, Accumulated a large number of taxi trajectory data which contains a lot of valuable information, This article aims to study trajectory data preprocessing and data mining by spatial analysis technology combined with the data mining technology,found that the potential value in trajectory data. studied the data mining mainly from two aspects: one is the study of mass trajectory data is used to calculate the taxi empty rate method, two is from the perspective of taxi supply and demand, extracting three feature points( "empty traveling", the passengers to get on and the passengers to get off) from taxi trajectory data, hot spot analysis of feature points, so as to analyze the spatial and temporal characteristics of the supply and demand of taxi. The following is the main research content of this paper.(1) Research Hadoop core components of basic theory and related technology: understanding the structure and principles of HDFS which is a distributed file system, study the abstract model of parallel program, the operational principle of program and the work flow of MapReduce. Familiar Hive which distributed data warehouse and the basic framework of GIS Tools for Hadoop.(2) Research mass trajectory data preprocessing methods: on the basis of summarizing the trajectory data error classification, puts forward the model of taxi trajectory data preprocessing based on Hadoop, the model realize the trajectory error data statistical analysis by Hive, to write parallel programs to complete small file merging and error data processing.(3) Research trajectory data mining method. Study the mass trajectory data is used to calculate the taxi empty rate method. On hot spot analysis of feature points, distributed feature points extraction core algorithm and distributed feature point grid statistics algorithm is proposed. Research Getis-Ord Gi* clustering algorithm. Through the experiment to determine Getis-Ord Gi* hotspot analysis tool parameters, using ArcGIS temporal data visualization representation for spatio-temporal hot data.(4) The case study, Build a Hadoop cluster, data preprocessing and the spatial and temporal characteristics of mining of 13799 taxis and 9 days of trajectory data in Shenzhen, calculating the taxi empty rate, analysis the taxi empty rate time distribution, through the hot spot analysis obtained in each period of the hot spot of "empty traveling", the hot spot of passengers to get on and the hot spot of passengers to get off, and focus on further analysis, including the macroscopic analysis of the hot, hot partition statistic analysis and hot superposition analysis.Experiments show that real-time dynamic access the hot spot of "empty traveling", the hot spot of passengers to get on and the hot spot of passengers to get off through large-scale taxi trajectory data mining, so as to analysis of taxi supply and demand in time and space differences, it can effectively make up for the passengers and the driver information asymmetry and provide reference for the taxi relative imbalance between supply and demand.
Keywords/Search Tags:Taxi trajectory, Hadoop, Data mining, Data preprocessing, Hot spot analysis
PDF Full Text Request
Related items