Font Size: a A A

Research On The Prediction Of Transaction Price Of Second-hand House Based On Integrated Learning

Posted on:2021-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:C Y ZhaoFull Text:PDF
GTID:2439330626455577Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In 2018,thesecond-hand housing transactions accounted for 34.15% of the national housing sales.The transaction process of second-hand house is more complicated than that of newly built house,among which 80% of consumers need intermediary when buying second-hand house.However,in order to capture high profits,intermediary agenc ies often cheat on housing prices.This bad behavior not only causes economic losses to consumers,but also leads to the deterioration of the trading market atmosphere and disorder.Therefore,the study of the actual transaction price of second-hand housing has obvious practical significance.This paper uses the relevant data of all the second-hand houses sold in Pudong District of Shanghai in 2019.In addition,in view of the theory of characteristic price,this paper makes Baidu map POI data crawling based on the longitude and latitude of the house source,and studies it as the location factor of the house source.Because of the problems of data missing,data noise,data redundancy,data set imbalance,outliers and so on,this paper first transforms the data into clean data which can be used for statistics and modeling by means of data cleaning,and divides and explains the basic attributes,behavior attributes,transaction attributes and location attributes of second-hand houses.Secondly,this paper focuses on the statistical analys is of transaction attributes,and points out the statistical law of the difference between the listed price and the transaction price by sampling,and points out the limitations of using listing price as the reference price of second-hand housing transaction.Thirdly,the characteristic distribution of the other three attributes is also studied.And it explores the relevant situation of each characteristic and transaction price in detail.In order to effectively deal with the characteristics of the second-hand housing data such as the complexity of samples and the imbalance of housing resources,this paper constructs two integrated models of decision tree: random forest and xgboost,and comprehensively uses the grid search cross validation method to optimize the parameters,and evaluates the prediction performance of the model by using a variety of model performance evaluation indicators.The results show that the fitting degree of the two models is good,and the prediction accuracy of the stochastic forest model based on bagging integration method is better than that of xgboost model based on boosting integration method.In addition,four attributes contribute to improve the prediction performance of the model,and the corresponding contribution feature ranking is given.Then feature selection is based on the importance of xgboost algorithm,and model fusion is carried out with the prediction results of random forest.The prediction performance of the fusion model is significantly higher than that of the two initial models,and the prediction error of the fusion model is significantly less than the difference between the listing price and the transaction price,which shows that the prediction result of the fusion model is more transaction reference than the listing price.At the same time,the relevant research results can analyze and predict the real transaction price of second-hand real estate in real time,and provide a more open and accurate reference price for it.
Keywords/Search Tags:Second-hand house price, Dimension expansion, Integrated learning, Model integration
PDF Full Text Request
Related items