Font Size: a A A

Multi-scale Spatialization Of Urban Population Based On Random Forest Algorithm

Posted on:2022-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2480306530997569Subject:Cartography and Geographic Information System
Abstract/Summary:PDF Full Text Request
Modern cities are highly populated areas,and population is the most dynamic and innovative element in cities.With the need of fine urban management,grasping the detailed spatial distribution of population becomes the basic condition of scientific and effective urban management.Chongqing is a megalopolis located in the western region of China.With its favorable location and policy advantages,its development potential has attracted a large number of population to gather rapidly,which provides a strong impetus for the rapid development of the city.However,urban sustainable development faced some challenges due to the imbalance between job and housing population and traffic congestion.Therefore,the systematic monitoring of urban population distribution can provide a basic and reliable reference for fine management of megacities.Multi-scale spatialized population data play an important role in meeting the needs of different social fields for population data.In the field of urban planning,high-resolution population distribution data is conducive to analyzing the population coverage of public facilities and promoting the optimal allocation of public resources.In addition,in the event of sudden natural disasters or epidemics,the government departments can take different rescue and emergency response measures according to the population distribution in the affected and infected areas,and then assess the human and economic losses after the disasters or epidemics.For ecological environmental protection,population distribution data with different grid resolutions have a great impact on the study of multi-scale ecosystem.Therefore,this study aims to explore a practical and reliable multi-scale population spatial modeling method suitable for Chongqing.The existing population data are mostly released by statistical departments,and the population data based on administrative division is integrated within administrative units,unable to reflect the heterogeneity of its population distribution.Therefore,it is necessary to extract the factors that affect the population distribution to establish a model to downscale the statistical population data,getting the number of grid population on different spatial scales.Previous studies mostly established models through spatial interpolation or multiple regression methods,and the final population data products were relatively rough.With the abundance of data sources and the innovation of modeling techniques,the research on population spatialization has made remarkable progress.The traditional remote sensing data include land use data,night light data and terrain data.The user data generated after the use of new Internet and mobile devices.Data sources of population modeling are more abundant and can more keenly reflect the differences of population distribution.Random forest method is the latest method in the process of population spatial modeling,which has the advantages of high modeling accuracy and strong prediction ability.The machine learning method can also help researchers to further explore the internal mechanism of modeling factors affecting population distribution and provide scientific theoretical support for formulating appropriate population management policies.Geographical phenomenon has a strong scale effect,and the spatial distribution of population will vary greatly due to different grid scales.At present,there are few multi-scale studies on the spatialization of population,so this research direction is worthy of further exploration.Based on summarizing predecessors'research,combined with the reality of Chongqing,we choose the central Chongqing as the study area.To establish a multi-scale urban space method,we use random forests regression and multi-source data fusion method.A scale-down population spatialization method study was conducted for the 2018 street and township level demographic data of the region.This study mainly uses multi-source geospatial data including 13 types of map points of interest data,Luojia1-01 nighttime light remote sensing data,residential vector spot data,urban road data,digital terrain data and real street population survey data.In this study,the above data are firstly pre-processed and unified into raster data for multi-source data fusion,after which they are put into a random forest regression model to train a multi-scale population distribution weight layer and finally obtain a dataset of resident population distribution on residential land in the central city of Chongqing.By comparing with some real community population numbers,this study mainly uses two evaluation indexes,community error analysis and decision coefficient,to measure the population prediction accuracy and select the best modeling scale by this means.Finally,the process of modeling factors affecting population distribution is explained and analyzed in detail through three importance analysis methods and bias correlation analysis plots of each factor.The two main research results are as follows:(1)Firstly,take the 30m grid scale as an example,we extract the mean value of the variables on the street scale as the modeling factor,and take the logarithm of the population density of the street as the dependent variable,then input them into the random forest regression model.We adjust the parameters of the number of decision trees and the number of participating modeling factors of the random forest to obtain the best accuracy of the population distribution weight estimation model,after which the weight layer is multiplied with the total population of the street census The final output is the 30m grid-scale population data of the central Chongqing.Based on this method,we obtain population distribution layers of 50m,100m,200m,300m,400m,500m,600m,700m,800m,900m,and 1000m grid size.The results are then directly verified and evaluated with the community-scale real population data,and finally it is found that the accuracy error is smaller and the fitting accuracy is higher at 100m spatial resolution(R~2=0.59,p<0.01),so 100m was used as the best population distribution grid cell for the central city of Chongqing.(2)After training the model and getting final results,the contribution of each influence factor in the population estimation model is quantitatively analyzed by using two indicators based on model importance and partial dependence analysis,namely,the importance of characteristics and the value of change in prediction results.On the one hand,we compare the importance of all variables at the overall level by using different calculation methods;on the other hand,we analyze each variable involved in the modeling in depth by analyzing the impact details of a single variable.Through the analysis,we find that the points of interest in the map,such as the points of lifeservice facilities,restaurant facilities,residential facilities and education facilities,have a higher level of contribution to the population distribution,while nighttime lighting has a smaller degree of influence on the fine population distribution,and the natural factors of elevation and slope have a weaker importance on the population distribution.When analyzing the importance of each variable for population distribution one by one,the population density increases with the density of map points of interest,but after increasing to a certain level,the population density will level off.The nighttime lighting data has less influence on the change of population prediction value,showing a trend of increasing and then decreasing.The distance from the nearest road,elevation and slope factors show a negative correlation with population density.This paper proposes a multi-scale urban population spatialization method based on random forest regression model,which can obtain more intuitive and accurate population distribution data and enrich the method of creating multi-scale population grid data.Due to the limitation of modeling data sources,the research in this paper focuses on static population spatialization modeling.Considering the strong mobility of urban population,future research should consider adding time scales to dynamically analyze the changing spatial distribution of population using richer spatio-temporal data sources.In the future,random forest and other intelligent models can also be compared and analyzed to explore a more efficient and practical method for spatialization of population.
Keywords/Search Tags:Random forest, Multi-scale, Multi-source data, Population Spatialization, Chongqing
PDF Full Text Request
Related items