Font Size: a A A

Research On Second-Hand House Price Prediction In Yantai City Based On Machine Learning Methods

Posted on:2024-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z X LvFull Text:PDF
GTID:2568307136952419Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
As the backbone of China’s economic construction,the real estate industry has laid a solid foundation for people’s happiness and the country’s steady development.The CBRC has repeatedly pointed out clearly that it should adhere to the positioning of the house "only live not to speculate" and continue to promote the transaction mechanism to stabilize housing prices and expectations.As the development of primary housing sources around the world is restricted,the real estate industry is transforming from an incremental market to a stock market,which means that more second-hand houses are flowing in the property transaction market.Currently,several cities have introduced policies regarding the reference price mechanism for second-hand properties,which aims to regulate the property transaction market and purify the property online buying and selling channels.The policy relies on big data and artificial intelligence to provide major cities with accurate and effective reference prices for second-hand houses,providing more transparent and rational transaction prices for residential buyers and sellers.As a coastal area in Shandong Province,Yantai has seen steady growth in economic development in recent years,creating a number of competitive emerging industrial clusters.With the promotion of Yantai’s "12335" large urban area construction pattern,it is bound to attract many foreigners to settle here in the future,but Yantai has not yet issued a relevant reference price mechanism platform for second-hand houses.Therefore,the study of the factors and trends of second-hand property prices in Yantai will provide further insight into the development of the second-hand property market and rational regulation of the second-hand property market,and provide reference value for the Yantai government to formulate relevant policies in the future.In this paper,we use the data from the Yantai real estate transaction website to establish a database of the price influencing features of second-hand properties in Yantai from a microscopic perspective,and obtain a machine learning model that can effectively evaluate the price of second-hand properties in Yantai through model comparison analysis.The overall structure of the study has four main aspects: data acquisition and pre-processing,feature selection,model construction,and comparative experimental demonstration.The main results and conclusions formed are as follows:(i)With the help of web crawlers,25,892 Yantai second-hand house data containing information on the hardware conditions of the house itself,the environment of the area where the house is located and the infrastructure around the house were collected by selecting Yantai Chain Home second-hand house transaction website,Yantai Anjuke website and Gaode Map API.Through visual analysis and exploratory research,the cleaning and sorting of the data set was completed,and finally 24,421 complete Yantai second-hand house data were obtained.(ii)The Yantai second-hand housing dataset contains 43 feature variables,including 17 numerical and 26 categorical variables.In order to efficiently adapt the training of the algorithm,the 26 category-type variables are quantified and coded in different ways,and Cat Boost embedded is selected to feature filter the dataset to obtain the best modeling feature subset.(iii)To construct a machine learning-based price prediction model for second-hand properties in Yantai City,the Cat Boost algorithm is used as the main reference model,its feasibility and algorithm principle are introduced,and the Grid Search CV method is used to optimize Cat Boost,on which SHAP value visualization analysis is added to strengthen the model interpretability from the perspective of feature contribution value.And additionally,four types of commonly used property valuation models are constructed,namely decision tree,K-nearest neighbor,Ada Boost,and random forest,so as to verify the model validity.(iv)A five-fold cross-validation approach was used to divide the training and validation sets of the Yantai second-hand housing dataset,and five types of machine learning models,Cat Boost,decision tree,K-nearest neighbor,Ada Boost,and random forest,were trained on the training set,and the improved Cat Boost was added.corresponding evaluation indexes were selected,and the model results were compared and tested according to the training results.The results show that the optimized Cat Boost model has the highest goodness-of-fit and the smallest prediction error,with a coefficient of determination of 0.967,an average absolute error of 126.17,a root mean square error of 297.19,and an average relative error percentage of 10.2%.And the analysis of SHAP values concluded that area and business district are the more important factors affecting the price of second-hand house transactions.The above research finally obtained a machine learning model that can effectively predict the price of second-hand houses in Yantai City,further enriching the algorithm library for house price evaluation and providing a feasibility analysis for the future formulation of a reference price policy for second-hand houses in Yantai City.The long-term timeliness data can be provided on this basis later to obtain a more stable house price evaluation model.
Keywords/Search Tags:second-hand house, web crawler, GridSearchCV, CatBoost algorithm, SHAP
PDF Full Text Request
Related items