| With the rapid development of the real estate industry,the popularity of buying houses has gradually increased in recent years.Housing is the foundation of the people’s life,and housing prices are more about the happiness of the entire national economy and people’s lives.Therefore,housing prices have gradually become the focus of social concern.In the current article,the traditional multivariate regression method is generally used to study the housing price problem.The disadvantage of this method is that it is not affected by the different communities,the community and the surrounding environment,and the accuracy of the housing valuation is difficult to guarantee.Therefore,this paper adds spatial effects based on the traditional feature price model,and uses machine learning methods to establish a more scientific and accurate used housing valuation model.This improved spatial feature price model can help developers make better investment decisions and provide a more accurate reference for consumers when purchasing a house.The main contents of this paper include:1.Applying the crawling method to collect 1060 data of 132 communities in Panlong District of Kunming City,preprocessing the missing data and dirty data,introducing the concept of spatial differentiation into the traditional valuation model,and proposing an improved spatial feature price model.2.Using a variety of methods to predict the collected house price data.Firstly,the relative theoretical knowledge of the method used is introduced.The most traditional method of introducing spatial variables,multivariate linear regression,is used to analyze the data.It is found that the fitting degree is not higher,the F value is smaller.Then using XGboost and random forest method to find several characteristic variables that have a great influence on the unit price of the cell.After 10 fold cross-validation,the optimal parameters are selected.Finally,70%is taken as the training set,and 30%is the test set.Then the model is supported by support vector regression,random forest regression and XGboost regression to obtain the predicted value of the unit price.3.Comparing the error of the forecast value of the test set with the goodness of model fitting,analyzing the main factors affecting the valuation model and making relevant suggestions.The results show that the location of the community,the traffic conditions around the community(such as the distance from the nearest subway station,bus station),the educational environment around the community(far from the nearest school)and age of the house have a great impact on housing prices.By extracting the variables with higher importance,we find that the random forest regression method has the lowest relative error,and the XGboost regression method has the highest model fitting goodness,in other words,the random forest regression model and the XGboost regression model that introduce spatial effects are more suitable for researching on used housing data of Panlong District in Kunming. |