| With the rapid development of mobile internet,a large number of data are generated at every moment,data explosive growth,and storage devices become cheaper and cheaper.However,in the face of massive data people find it more and more difficult to find information suitable for themselves,and it takes more and more time.On the other hand,with the rise of artificial intelligence,intelligent transportation system is proposed in the field of transportation.The prerequisite for the operation of ITS is to have an accurate and real-time traffic flow prediction system.Through the real-time traffic flow forecasting system,the traffic flow in the next few minutes to tens of minutes can be predicted,and the road traffic condition can be adjusted by traffic lights and traffic control management system in advance,so as to achieve the goal of alleviating congestion and improving traffic efficiency.Therefore,in the intelligent transportation system,the accurate prediction of traffic flow becomes particularly critical.Aiming at the problem of low accuracy and delay in intelligent traffic flow forecasting,the thesis studies the related principles and implements an improved algorithm.It also implements a distributed algorithm on Spark platform,which improves the accuracy of forecasting and reduces the execution time of the algorithm.Firstly,the thesis introduces the sources of historical traffic data,and then preprocesses the historical traffic data.Then the road network is analyzed and modeled.and the feature data of the predicted road are extracted,and the feature vector space is constructed,a linear regression algorithm is implemented in Python language to predict traffic flow.The validity of the algorithm is proved by the relevant experimental data.Finally,the thesis studies the current intelligent traffic flow model,improves the multivariate linear regression algorithm and Gradient descent algorithm.Combined with the large data processing technology Spark and Hadoop to storage and calculation of massive traffic data a distributed multivariate linear regression algorithm based on Spark platform is designed and implemented,and its specific experimental scheme is elaborated in detail.Verify the test.The experimental results show that the Spark-based multivariate linear regression algorithm effectively improves the efficiency of the algorithm,the improved algorithm based on multiple initialization values and Boosting improves the accuracy of prediction. |