| Intelligent transportation systems are extremely helpful to improve the level of traffic management,and short-term traffic flow forecasting helps to improve the functions of intelligent transportation systems in traffic guidance,signal allocation,and information release.Short-term traffic flow is easily affected and fluctuates.There are many modeling methods in the literature to select some of the factors for analysis,but in general,the performance of existing results in dealing with uncertain and non-linear factors is far from the actual application requirements.There is still a gap between the degree of reliability and reliability,and the portability is poor.When the number of lanes and traffic distribution at the intersection changes,the model usually needs to re-select factors for modeling.With the increase in computing power of computers and the emergence of integrated algorithms,it has become possible to use multi-source and multi-dimensional big data to support universal models for prediction.Numerous documents show that customized machine learning algorithms can build models with strong applicability,which will be of great help to short-term traffic flow prediction.This paper is based on support vector regression(SVR),random forest(RF)and lightweight gradient boosting machine(LightGBM)to establish traffic flow prediction models for different traffic conditions.The top-level clustering algorithm intelligently identifies the applicable traffic conditions of the model,and improves The predictive ability of the entire system is improved.The main work content and research results of this paper are as follows:1.According to the actual analysis of the fixed and random factors that affect the traffic flow,a complete feature set is constructed;after data preprocessing,missing data filling,speed correction,abnormal data processing,etc.,the existing bayonet traffic data is counted as the feature corresponding Data set..2.Based on SVR and RF respectively,the vehicle flow prediction model is established,and the random search algorithm is improved through the given search direction,which is used for the hyperparameter tuning of the model.In the process of establishing the SVR prediction model,the selection of feature subsets is based on correlation analysis;in the process of establishing the RF prediction model,the feature variables are selected in the way of dynamic selection of feature variables and fixed value cumulative importance,which effectively reduces the complexity of the model.The model training speed under the new feature subset has been increased by 31.43%and 42.2%,respectively.At the same time,common problems such as the loss of some important features caused by the failure of the detection equipment in actual operation are studied.The prediction accuracy of the RF model has dropped by 1.5%,and it has a certain degree of stability..3.Explore the application of LightGBM in urban road short-term traffic flow prediction,apply the particle swarm algorithm in the process of super-parameter optimization of the model,and slightly improve the speed update method of the particle swarm algorithm to improve the convergence of the algorithm Speed,the SPSOLightGBM prediction model is proposed to verify the effect of the model with the verification set.The highest accuracy of this article is 91.17%,and the average predicted traffic flow error is 6.32.Finally,the difficulty of the model in parameter adjustment and comparative verification are analyzed.The shortcomings that are greatly affected by outliers are given,and the applicable scenarios of the model are given.4.Use a high-dimensional clustering algorithm to classify different traffic states,interpret the mathematical statistics of some features in the category as the actual traffic state,compare the performance of the model under different traffic states,and give the application of different models,That is,the support vector machine is suitable for smallscale data and the data changes smoothly;the random forest runs faster when dealing with large-scale data,and outperforms the support vector machine when the data fluctuates;the gradient boost is more effective The accuracy is higher when the data is large,but it is easily affected by outliers,and the hyperparameter adjustment is more complicated. |