| In the era of data technology with the deeply integrated development of big data and artificial intelligence,urban computing provides new ideas and ways to solve urban problems such as traffic congestion.Building a data-driven intelligent transportation systems(ITSs)is the critical task of establishing a data-centric smart city,and efficient and accurate traffic flow prediction is the meaningful content of constructing ITSs.This thesis studies the theory,methods,and applications of shortterm and real-time traffic flow prediction based on a Hadoop distributed computing platform using the MapReduce and Spark parallel processing framework with mobile trajectory(taxi)big data to address the robustness,accuracy,and timeliness of traffic flow prediction.It can provide a theoretical basis and technical support for the dynamic supervision,warning control,and convenience services of transportation.The main contributions and innovations of this work are summarized as follows: 1.In the construction of big data analysis platform: We build a Hadoop distributed computing platform(big data analysis platform)based on the parallel processing frameworks of MapReduce and Spark to solve the problems of distributed storage and parallel computing of mobile trajectory big data.More specifically,based on the big data analysis platform,we perform the data preprocessing of large-scale mobile trajectory data,including data cleaning,data conversion,and data standardization.Moreover,based on the MapReduce parallel computing framework,we implement the short-term traffic flow prediction.Finally,based on the Spark parallel computing framework,we conduct the real-time traffic flow prediction.2.In terms of short-term traffic flow prediction: A distributed WND-LSTM model based on MapReduce is proposed to predict short-term traffic flow.More specifically,under the Hadoop distributed computing platform,a distributed modeling framework for traffic flow prediction based on MapReduce is developed to solve the problems of storage and calculation in large-scale traffic flow data processing.Meanwhile,the original GPS taxi trajectory data are handled to achieve discrete data smoothing via removing the outliers with Kalzman filtering(KF).Furthermore,a distributed WNDLSTM model is put forward based on the Map Reduce framework,which utilizes the time window and the normal distribution to weigh the LSTM neural networks by computing the cost and then converting the cost into a time-series problem to predict traffic flow.Finally,we employ the proposed WND-LSTM model to predict the shortterm traffic flow of the Sanlihe East Road in Beijing of China with a real-world taxi GPS trajectory big data.3.In terms of real-time traffic flow prediction: A distributed W-BiLSTM model based on Spark is presented to predict real-time traffic flow.More specifically,under the Spark distributed computing platform,the original taxi GPS trajectory data are processed to achieve discrete data smoothing by removing abnormal points using KF.Moreover,a distributed W-BiLSTM model is developed on the Spark-based framework to predict real-time traffic flow.It utilizes the interaction between the time window and the adjacent road segments to weight the Bi-directional LSTM by calculating the cost.The cost is converted into a time series problem to predict traffic flow via forming a single variable varying with time,which predicts the state of the variable at the future time interval.Finally,we use the proposed W-BiLSTM model to predict the real-time traffic flow of the Sanlihe East Road in Beijing of China with a real-world taxi GPS trajectory big data. |