After 2000,China’s real estate industry has developed rapidly.The transaction volume of first-hand and second-hand houses has continued to rise.Compared with first-hand houses,second-hand houses have more advantages,such as the surrounding supporting facilities,business districts,transportation,medical care,schools and other public infrastructures are more mature than first-hand houses,and the supporting facilities are also more complete.Since 2011,the sales of second-hand houses have exceeded those of first-hand houses,the second-hand house market has become more and more prosperous,and a large amount of second-hand house transaction data has been generated.With the continuous development of Big Data and Machine Learning,it is of great consequence to analyze the relationship between housing resources and the price of second-hand housing from these massive housing transaction data and accurately predict the prices of second-hand houses.It can provide price reference for second-hand house buyers,return the price to rationality,and promote social fairness and harmony;on the other hand,it can reduce the transaction risks and conflicts between the two sides,promote the harmony and win-win situation between the two sides of the second-hand house transaction,and then regulate the second-hand house transaction market and promote the harmonious development of real estate.Based on Spark Big Data processing framework,thesis takes more than 90,000 real transaction data of Shenzhen second-hand house transactions from 2010 to 2020 on LIANJIA website crawled as the original dataset,and combines Machine Learning algorithms to valuate Shenzhen second-hand house prices using LightGBM and the improved XGBoost model fusion method,which improves the accuracy of house price valuation and can be better applied to real estate price evaluation.The main research content and contributions are as follows:(1)Introducing POI(Point of Interest)into the dataset expands the dataset and improves the accuracy of house price assessment.Based on the latitude and longitude in the dataset,combined with Baidu map,POI processing is applied to the dataset to make the dataset more realistic and meaningful.Compared with the evaluation results of the dataset without POI,the addition of POI dataset makes the evaluation results more accurate.(2)The improved XGBoost model.The XGBoost model is improved based on the grid search algorithm and K-fold cross validation to find the most suitable parameters in the XGBoost model,which effectively avoids the occurrence of overfitting and underfitting and enables the prediction accuracy to be improved.(3)A second-hand house price assessment approach based on the fusion of LightGBM model and the improved XGBoost model is proposed.Experimental results show that model fusion of the LightGBM model and the improved XGBoost model,which has the best effect on house price assessment,and it can obtain better prediction than a single model.Therefore,the algorithm proposed in this thesis improves the price evaluation accuracy of second-hand houses in Shenzhen.The combination of Big Data processing and Machine Learning algorithms overcomes the disadvantage of using a single method that leads to low accuracy of house price prediction.At the same time,the added POI increases the practical significance of the dataset,improves the accuracy of house price valuation,enriches the method of second-hand house price valuation,and provides a new way for modern real estate valuation. |