As the main skeleton of the public transportation system,urban rail transit carries the daily travel of most residents.Its rapid development has made a great contribution to solving the problem of traffic congestion,but it also puts forward higher requirements for its operation and management.As the pace of rail transit construction accelerates and the rail network is becoming more and more complete,station passenger flow prediction has become a prerequisite for rational planning,construction,and efficient operation and management of rail transit.Therefore,this paper collects POI and built environment information around rail stations,and combines rail transit AFC data to establish a passenger flow prediction model for rail stations based on multi-source data.The main research contents are as follows:First,use Python to clean the original track AFC data,and extract the passenger flow data of track stations at different time granularities.Summarize the time-varying characteristics and periodic changes of passenger flow in different periods from the time dimension,and explore the influence of geographical location differences on passenger flow distribution from a spatial perspective,so as to construct three passenger flow indicators,namely passenger flow,peak time,and tide ratio.The K-means clustering algorithm is used to divide the rail station into 7 passenger flow patterns,and the characteristics of the passenger flow pattern are analyzed based on the actual situation of the station,and it is found that the passenger flow of the rail station is related to the surrounding built environment.Secondly,in order to finely study the impact of the built environment around the rail station on the passenger flow of the station,combining the topographic characteristics of the mountainous city,the site location and the site connection environment to reasonably determine the attraction range of the rail station,and use Arc GIS to extract the space attraction range of the rail station.Through the combination of web crawler technology and multiple data acquisition methods,the collection of urban multi-source data such as POI and built environment is realized,so as to establish a candidate set of factors affecting passenger flow at rail stations.Finally,the stepwise regression method is used to screen out 9 key variables that affect the passenger flow of rail stations,and a multiple linear regression model is constructed to initially explore the relationship between the passenger flow of stations and the built environment.On this basis,the geospatial location information of the orbital station is incorporated into the model,and a GWR model considering the spatial heterogeneity is established to predict the passenger flow of the orbital station.The random forest algorithm is used to rank the importance of various feature variables that affect the site’s passenger flow,and the top 95% of the feature variables with cumulative importance are selected to construct the random forest prediction model.The above three prediction models are used to predict the passenger flow of 7 types of stations with different passenger flow patterns in the Chongqing rail network.The results show that the random forest model considering multiple characteristic variables is closer to the actual value of the various rail stations.Moreover,it is better than the OLS model and the GWR model in the three evaluation indicators of MAE,MAPE and RMSE.It has shown high prediction accuracy and good prediction performance in the passenger flow prediction of orbital stations.The prediction results can provide scientific basis and data support for promoting the refined operation and management of urban rail transit. |