Font Size: a A A

Prediction Of Individual Travel Behavior Based On Data Mining Algorithms

Posted on:2022-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:X L CuiFull Text:PDF
GTID:2518306542961989Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of modern communication technology,mobile phone,as an indispensable communication tool,records people’s activity trajectory,including not only real space,but also the trajectory of cyberspace.There has been a lot of analysis and mining of various kinds of information data,and has achieved good results in the field of public transportation,but there is less research on tourism.If the user’s future travel behavior can not be effectively predicted,will lead to organizations and decision makers can not accurately advance the preparation,easy to lead to supply and demand imbalance,resulting in waste of resources.Therefore,there is an urgent need for effective and accurate methods to predict the user’s future travel behavior,avoid risks,and provide effective reference for organizations,decision makers and individuals.To this end,this thesis combines the real data of Anhui mobile communication industry and government affairs openness to establish a model to predict the user’s travel behavior.The main work of this article can be summarized in the following areas:(1)Compared with the traditional method of data cleaning(deletion and filling),a data mining algorithm is proposed.The algorithm uses one-Hot Encoding to process non-continuous subtype variables(i.e.,application classification)in the data set,selects feature variables according to the Pearson coefficient and feature variable heat of the target variable and feature variable,and processes the track behavior data sheet according to the user ID and whether the labels of the scenic area columns above 4A level in the province process,derives the total stay time,scenic stay time and non-scenic stay characteristics of each user.The algorithm not only solves the problem of processing subtype variables,but also expands the feature dimension,selects the characteristic data with large amount of information,and constructs the data set of the user’s travel behavior prediction research.(2)The data imbalance in the training data set leads to the problem that the model predicts the classification effect is not ideal.To this end,a simple random oversampling algorithm is used to deal with data imbalance,and the core idea of simple random oversampling is to construct a new balanced data set by copying and repeating a few class samples,so that the sample sizes of a few classes and most classes are the same.After the balance of the algorithm,the proportion of users with travel behavior was 50.3%,and the proportion of users without travel behavior was 49.7%.(3)In model selection,extreme gradient boosting(XGBoost),Cat Boost,Light GBM,and Stacking fusion models are used.In view of the parameter optimization problem of the model,a combination of grid search CV and cross-validation algorithm is used to optimize its parameters.After the parameter optimization,the test results show that the performance of each classification algorithm has been improved significantly compared with the traditional algorithm after data balance and grid search cross-validation.From the single model point of view,the effect of each model is sorted in order: XGBoost,Cat Boost,Light GBM,and their ROC curve area has reached more than 0.8,the prediction effect is better.Based on the Stacking fusion model,the performance is optimal,with an accuracy rate of 87.65%,a recall rate of 94.61%,and an F1 value of 91.00%,with an overall improvement over the single model.Therefore,Stacking fusion model is more suitable for the prediction of user travel behavior,organizations and so on can adjust management decisions according to the number of future user travel behavior changes,reduce unnecessary losses,to ensure people’s safety and improve happiness index.In summary,this thesis has conducted a detailed study on the incompleteness of data processing in the prediction of individual travel behavior,the imbalance of data in the training set,and the optimal selection of model parameters.The established model can accurately predict the changes in individual travel behavior,and at the same time output the ranking of the importance of feature variables,which is convenient for the optimization and improvement of subsequent statistical work.The model is highly interpretable,and the research content has certain practical value.
Keywords/Search Tags:Data Analysis, Travel, XGBoost, LightGBM, CatBoost, Stacking
PDF Full Text Request
Related items