Font Size: a A A

MapReduce-Based Methodologies Of Mobile Trajectory Big Data Mining And Its Applications

Posted on:2017-04-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:D W XiaFull Text:PDF
GTID:1108330509954507Subject:Statistics
Abstract/Summary:PDF Full Text Request
In the age of Data Technology (DT) with "Internet+" and "Big Data x", big data attracts great attention from industries, academia and governments, and especially mobile trajectory big data analytics is becoming the research hotspot of urban computing and smart city. Nowadays, traffic congestion, environmental pollution, energy shortage and other prob-lems are seriously affecting the livability of the city and sustainable development. Mining, analysis and utilization of position trajectory data in mobile social networks provide new ideas to solve urban issues. In this thesis, we focus on the timeliness, accuracy and robustness of MapReduce-based taxi trajectory big data mining and its applications, to offer a theoretical basis and technical support for the dynamic monitoring and early warning control of complex transportation networks.The main contributions of this work are summarized as follows:(1) A MapReduce-based Parallel Frequent Pattern growth algorithm, MR-PFP, is pro-posed to analyze the spatial-temporal characteristics of taxi operating using massive small file processing strategies. First, we implement three methods, i.e., Hadoop Archives (HAR), CombineFileInputFormat (CFIF) and Sequence Files (SF), to compensate the existing de-fects of Hadoop, and then propose two strategies based on their performance evaluation of memory consumption and processing efficiency. Moreover, we incorporate SF into Frequent Pattern growth (FP-growth) algorithm, and implement the optimized FP-growth algorithm in a MapReduce framework. Finally, we analyze the characteristics of taxi operating in both spatial and temporal dimensions using MR-PFP. The results demonstrate that MR-PFP is superior to the existing Parallel FP-growth (PFP) algorithm in efficiency and scalability.(2) A MapReduce-based Parallel Three-Phase K-Means algorithm, Par3PKM, is put for-ward to address traffic subarea division problem with large-scale taxi trajectories on a Hadoop distributed computing platform. First, we propose a Distributed Traffic Sub-Area Division (DTSAD) method, including parallel clustering and boundary identifying. Furthermore, we adopt two distance metric approaches and three cluster initialization strategies to modify K-Means, and implement it in a MapReduce parallel processing framework through the Map, Combiner and Reduce functions. Finally, to build traffic subarea, we present a boundary identifying method to connect the borders of clustering results for each cluster. The results indicate that, in comparison with Parallel Two-Phase K-Means (Par2PK-Means), Parallel Clustering LARge Applications (ParCLARA) and K-Means algorithms, Par3PKM has higher efficiency, accuracy, scalability and reliability. In particular, the division results are strongly consistent with the actual traffic conditions.(3) A distributed Spatial-Temporal Weighted K-Nearest Neighbor model, STW-KNN, is presented to forecast short-term traffic flow in a general MapReduce framework of distribut-ed modeling on a Hadoop platform. First, we develop a general MapReduce Framework of distributed modeling for Traffic Flow Forecasting (MF-TFF), to solve the computation and storage problems of stand-alone learning models in handling big trajectory data for a partic-ular application. The developed framework is general enough that can be utilized for other data-driven traffic prediction methods as well. Moreover, based on MF-TFF, we propose the STW-KNN model which considers the spatial-temporal correlation and weight in terms of the upstream-downstream and past-future of traffic flow with trend adjustment features, by optimizing the search mechanisms containing the state vector, similarity measure, predic-tion function and the choice of K. Finally, STW-KNN is implemented on MapReduce for the parallel prediction of short-term traffic flow. Compared to K-Nearest Neighbor (KNN), Artificial Neural Networks (ANNs), Naive Bayes (NB), Random Forest (RF) and C4.5, the results demonstrate that STW-KNN achieves more than 89.71% accuracy improvement with the Mean Absolute Percentage Error (MAPE) value of between 3.34% and 6.00%, and also significantly improves the efficiency and scalability of short-term traffic flow forecasting.(4) A MapReduce-based approach for Traffic Flow Prediction using Correlation analysis, TFPC, is developed to forecast traffic flow in real time. First, we propose a Real-time Pre-diction System (RPS) including two key modules, i.e., Online Parallel Prediction (OPP) and Offline Distributed Training (ODT). Furthermore, we build a robust parallel nearest neigh-bor optimization classifier, ParKNNO, which discovers correlation information among traffic flows and incorporates it into the classification process. Finally, we present a novel forecast calculation method, combining the current data observed in OPP and the classification results obtained from large-scale historical data in ODT, to generate traffic flow prediction in real time. The results show that TFPC significantly outperforms four state-of-the-art prediction approaches, i.e., Autoregressive Integrated Moving Average (ARIMA), Naive Bayes (NB), Multilayer Perceptron Neural Networks (MLP-NN), and Nearest Neighbor (NN) regression, in terms of accuracy, which can be enhanced up to 90.07% in the best case, with an average MAPE of 5.53%. In addition, it displays excellent speedup, scaleup, and sizeup performance.
Keywords/Search Tags:Big data analytics, MapReduce, Taxi GPS trajectory mining, Spatial- temporal feature extraction, Traffic subarea division, Traffic flow forecasting
PDF Full Text Request
Related items