Font Size: a A A

Data Mining Of Consumption Factors Based On Tourist Attribute Information

Posted on:2017-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:L F QinFull Text:PDF
GTID:2308330485485119Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In contemporary society, tourism has become one fashionable way of relaxation and entertainment. However, the income of scenic spot is not optimistic. To solve this problem, many scenic spots begin to analyze variety of information which is collected in many ways, but some of them do not make full use of such data. Data mining can be used for massive data, through which researchers can find potential useful knowledge. The main productions are as follows:(1)The construction of random forest model. First of all, we should preprocess the tourism data that we collected. Because the processed data set is unbalanced data sets, we take SMOTE algorithm to deal with unbalanced data sets, getting the relative balance data set. On the basis of the relative balance data set, we established a random forest model. It is concluded that the importance of characteristic variables and characteristic variables and categories of partial correlation relationships are calculated. The experiment results show that income level, number of days, travel modes, the extent of the scenic spot tour theme, service quality of scenic spot, ticket price have effects on Tourist expenditure. Income level and tourist consumption level were positively correlated relationship, such as, quality number of days, travel modes, the extent of the scenic spot tour theme, service quality of scenic spot, ticket price. Based on partial correlation we can obtain the influence trend of characteristic variables of categorical variable changes, and then to put forward the corresponding Suggestions for improvement of the scenic spot.(2) The improvement of Random forests model. Firstly, the influential factors of the random forest model are analyzed in detail, including the number of primary attribute for node splitting, the number of decision tree in Random forests model and the two randomness in the process of modeling. We proposed partition extraction method for the random process of feature extraction. Using mutual information concluded the degree of correlation between characteristic variables and characteristic categories, according to the correlation degree of sorting characteristic variables, and then characteristic variables are divided into two correlation degree intervals. High correlation interval includes income level, number of days, travel modes, the extent of the scenic spot tour theme, service quality of scenic spot, ticket price. Low correlation interval includes education degree, way to travel, travel purpose, age, gender, profession.Setting algorithms mtry to 4, ntree to 1000,we establish new Random forests model, and find that The new model has good classification performance Compared with original model.
Keywords/Search Tags:Tourist consumption influencing factors, data preprocessing, SMOTE, random forest, partition between features extraction
PDF Full Text Request
Related items