Population data is the core basic data concerned by social and economic development,and the authoritative population data mainly comes from the census and population sample survey of the national statistical department.However,there are some problems in census data such as too long update cycle and insufficient refinement.Therefore,in-depth development of census data should be carried out to realize spatial inference of urban population on a fine scale,and a high-precision population grid data set that can be dynamically updated should be built.It is of great significance to correctly understand the characteristics of population distribution,help urban planning and construction,and improve the ability of social comprehensive management.The latest research on population spatialization mainly adopts machine learning or geographical weighted regression model to incorporate the nonlinear influence of population distribution and the spatial heterogeneity of population distribution into the research respectively.However,in reality,the two influences usually exist at the same time,and the analysis of the two separately will make the characterization of population spatial distribution incomplete and the simulation accuracy low.At present,there are relatively few studies on population spatialization on this issue,so this study starts with the traditional population spatialization data and new geographical big data,and adopts the integrated learning method of geographical weighted regression + machine learning model to study population spatialization.In the research process,firstly,the basic data and method model of population spatialization are summarized and sorted out in detail,which lays the theoretical foundation for this study.The empirical part is mainly divided into two stages.In the first stage,the whole area of Chengdu is selected as the research scope,and the land use situation,night light brightness value,point of interest kernel density value and road network density value are used to describe the population distribution,which describes the altitude,slope and topographic relief of natural conditions and the economic and social development level.With the latest population statistics data of Qipu township and street in Chengdu in 2020 as the dependent variable,an integrated learning model was constructed by random forest,XGBoost,weighted average stack,multiple linear regression stack and geographical weighted regression stack,respectively,to generate a 1km population grid dataset,and the fitting accuracy of each model was quantitatively and qualitatively verified.In the second stage,on the one hand,the trend law of the change of Baidu population thermal value in local units is extracted,and a 1km grid data set dynamically updated per hour is constructed.On the other hand,urban residential area data,building vector data and big data of geographical location are used to push down the 1km grid population data to the building scale.This thesis explores the improvement of spatial and temporal resolution of metropolitan population grid data set.The main conclusions are as follows:First,as far as population spatialization methods and models are concerned,empirical studies show that for township and street scale of large and medium-sized cities,the model adopted in this study is superior to World Pop data set in population spatialization fitting effect.Among them,the integrated model of machine learning+ geographical weighted regression,which incorporates the correlation and heterogeneity of population spatial distribution into the model construction,has the best performance,and the fitting results are superior to the single model and the traditional integrated model in terms of detail characterization and fitting accuracy,which is a method for large and medium-sized cities to better improve the population spatial effect.Specifically,compared with World Pop data set,the average absolute error between the fitted population density and the actual population density increased by 55.34% and the root-mean-square error increased by 74.83% in the generated 1km population grid data set,showing a significant accuracy improvement effect.Secondly,in the study of improving the spatial and temporal resolution of population grid data set,in the spatial dimension,the population distribution of high precision building scale is deduced by using urban district surface and building vector data,and good fitting effect is achieved.In terms of time dimension,the thermal value of population in Baidu was used to obtain the intra-day population change trend of Chengdu,and the spatial distribution and change of the total urban population were deduced.It was found that the average relative error of the fitting effect in terms of the total population was only 2.83%.The Spearman correlation coefficient,which fits the change trend of total population and the change trend of actual thermal value of population,is verified by the district counties.It is found that the method of dynamically updating grid population data by thermal value of population has a better effect in urban centers and suburbs with high population concentration and frequent mobility.In summary,this study believes that machine learning + geographical weighted regression as modeling method based on multi-source geographic data,urban area data,building vector data and new geographic positioning big data is an effective method for population spatialization with high spatial and temporal resolution. |