Font Size: a A A

Research On Second-hand Car Price Forecast Based On Machine Learning

Posted on:2022-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:X J XiangFull Text:PDF
GTID:2492306530488824Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
The continuous improvement of living standards has caused more and more cars to enter ordinary households,and the demand for cars has increased exponentially.In recent years,with the rapid development of Chinese automobile industry,the number of cars has continued to grow,and a steady stream of cars have flowed into the second-hand market.Coupled with the quiet change of people’s consumption habits,the second-hand car market has achieved unprecedented prosperity.However,the current second-hand car market does not have a unified price evaluation standard,which has led to frequent outbreaks of arbitrary price and transaction prices that are too low or too high.This creates a major obstacle to the transaction process and seriously affects the healthy and orderly development of the second-hand car market.Therefore,with the help of big data research methods in machine learning,starting from the massive amount of second-hand car index data,we explore the factors that significantly affect the price of second-hand cars,and build a set of accurate and reasonable price evaluation system to provide a price reference standard for buyers and sellers or intermediary platforms.They are of great significance to promote the standardization of car pricing in the second-hand car market.This paper uses the data of second-hand cars in the national area of Renren car platform to forecast the price.Firstly,the data is preprocessed and exploratory analysis,including missing values,outliers,data feature conversion,etc.,and the indicators are divided and specified according to three aspects: used car model parameters,vehicle condition factors,and regional factors.Next,based on the pre-processed data,descriptive analysis and visualization are carried out,which are initially explored distribution of the data and the correlation between variables.Then using variance selection method,correlation coefficient method,mutual information method,decision tree based CART method,and SVR-RFECV method for feature selection,this paper screened out 12 important independent variables after comprehensive analysis,which are automobile brand,country,new car price,mileage,vehicle age,emission standard,gearbox type,annual inspection expiration time,seat number,vehicle type,displacement and comprehensive fuel consumption of MIIT.In addition,PCA feature extraction was performed on the original data,and the first 9principal components were selected to compare the model prediction effects of different variable processing methods.For second-hand car price prediction,the random forest model,extreme random tree model and GBDT model were constructed for the two datasets after feature selection and feature extraction.After tuning the model parameters,the final prediction model was established.Through the model effect measurement value MSE,it is found that the prediction effect of the model using feature selection for variable processing is better than that of feature extraction,and the MSE value of the three models is reduced by 1.013 on average.Among them,the prediction effect of the GBDT model is the best,whose MSE value is 7.7251.This still has a large prediction error for the low-end and middle-end cars.And from the distribution chart of predicted value and actual value,it can be seen that the prediction effect of high-end cars is poor.For these reasons,the data is divided.For the middle-end and low-end cars with a price of less than 500,000 yuan,the GBDT model optimized by feature selection has the best prediction effect.The fitting goodness value is 0.979,MSE value is lower than 0.987,and the average estimation error is 0.9934 ten thousand yuan,which has a strong reference significance;for high-end cars with a price of more than 500,000 yuan,the prediction effect of the GBDT model is still the best,but the MSE value is still as high as 158.29,indicating that the existing indicators are not enough to explain the price fluctuations.There are more factors to consider when evaluating.In order to reduce the above existing price errors in actual transactions,the covariance analysis is used to continue to study the specific degree and direction of the impact of various variables on the price fluctuations of second-hand cars,combined with the results of the analysis,and finally realize the standardization of prices.Finally,based on the research content and conclusions,the paper puts forward suggestions from two aspects of buying and selling middle-end and low-end second-hand cars and high-end second-hand cars.And it points out the shortcomings of this paper and the direction of future work from the data variable collection,the authenticity of some indicators and the applicability of the model.
Keywords/Search Tags:price forecast, feature engineering, Random Forest, Extreme Random Tree, GBDT
PDF Full Text Request
Related items