Fast and reliable data support is significant for construction projects,including project feasibility study,investment decision,quota design,scheme selection and bidding decision,which can rely on making full use of the historical data and then predicting the project cost accurately and rapidly.Strengthening the accumulation and application of engineering cost data are also the direction of the development of engineering cost informatization in China and the requirement of the State Council to promote the sustainable and healthy development of the construction industry.Therefore,the motivation of this paper is to fully dig for the information of historical engineering data by using data mining technology,and predict cost efficiently by scientific methods.Firstly,this paper selects the prediction method by considering the characteristics of engineering construction data and cost data.The data of this paper belongs to housing project,which often have the characteristics of limited sample size,difficult collection and multiple attributes.Therefore,the cost prediction of housing project is a learning problem of small sample data.Based on the analysis of the application scenarios as well as advantages and disadvantages of previous methods,this paper selects the support vector machine algorithm as the prediction method,which has unique advantages in the field of small sample data regression prediction.Secondly,this paper takes housing engineering as the study object,collecting a large number of housing projects and cost data from the Guanglianda index network and the national construction project cost monitoring platform.Then,this paper determines the characteristic indicators for project cost prediction on the basis of plenty literature investigations,as well as combining the limitations of sample engineering data and expert advice of expert interviews.Next,the feature indexes are encoded and quantified according to the requirements of data mining algorithm.In terms of the sample data,abnormal values are sieved by improved K-means clustering method with the help of MATLAB and SPSS tools.And then,this paper determines 160 samples as the finally objects of this study.Finally,this paper trains and testes 160 sample projects by using the SVM model,PCA-SVM model and PLSR-SVM model,which index including their unilateral cost,partial project cost,measure project cost,other project cost,regulation cost,tax and other charges.The data of 127 projects are used as the training samples of the model,and the data of 33projects are used as case projects to test the model.From the perspective of the evaluation of the prediction model,the SVM model has the best prediction accuracy and robustness for the unilateral cost and partial project cost.The SSE,MSE,R~2and range of the unilateral cost can reach the optimal levels of 0.0309,0.0018,0.9284 and 0.1102.Although PCA-SVM model and PLSR-SVM model slightly shorten the prediction time,the prediction accuracy and robustness are worse than those of SVM model.Through the analysis of the prediction effect,it is found that the partial project cost and“unilateral cost”after deducting other project costs and other charges are highly predictable,while the predictability of the measure project cost and other project costs is relatively poor due to large engineering differences and strong cost subjectivity.Therefore,this paper develops a rapid prediction system for engineering cost based on the SVM model with the best prediction effect,and realizes the rapid prediction of engineering cost through simple project feature input.Based on the research status of rapid prediction of housing project cost,the innovation of this paper are as follows.Firstly,this paper uses an improved K-means clustering method to screen out outliers in the data preprocessing process.After getting complete and effective case data,this paper compares and analyzes results of the three models based on the idea of data mining,which proved that the SVM model has the best comprehensive performance.Based on the best model,this paper developes a rapid engineering cost prediction system to achieve the purpose of rapid engineering cost prediction through simple and interactive engineering feature input. |