Data Mining Of Consumption Factors Based On Tourist Attribute Information

Posted on:2017-01-10

Degree:Master

Type:Thesis

Country:China

Candidate:L F Qin

Full Text:PDF

GTID:2308330485485119

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In contemporary society, tourism has become one fashionable way of relaxation and entertainment. However, the income of scenic spot is not optimistic. To solve this problem, many scenic spots begin to analyze variety of information which is collected in many ways, but some of them do not make full use of such data. Data mining can be used for massive data, through which researchers can find potential useful knowledge. The main productions are as follows:(1)The construction of random forest model. First of all, we should preprocess the tourism data that we collected. Because the processed data set is unbalanced data sets, we take SMOTE algorithm to deal with unbalanced data sets, getting the relative balance data set. On the basis of the relative balance data set, we established a random forest model. It is concluded that the importance of characteristic variables and characteristic variables and categories of partial correlation relationships are calculated. The experiment results show that income level, number of days, travel modes, the extent of the scenic spot tour theme, service quality of scenic spot, ticket price have effects on Tourist expenditure. Income level and tourist consumption level were positively correlated relationship, such as, quality number of days, travel modes, the extent of the scenic spot tour theme, service quality of scenic spot, ticket price. Based on partial correlation we can obtain the influence trend of characteristic variables of categorical variable changes, and then to put forward the corresponding Suggestions for improvement of the scenic spot.(2) The improvement of Random forests model. Firstly, the influential factors of the random forest model are analyzed in detail, including the number of primary attribute for node splitting, the number of decision tree in Random forests model and the two randomness in the process of modeling. We proposed partition extraction method for the random process of feature extraction. Using mutual information concluded the degree of correlation between characteristic variables and characteristic categories, according to the correlation degree of sorting characteristic variables, and then characteristic variables are divided into two correlation degree intervals. High correlation interval includes income level, number of days, travel modes, the extent of the scenic spot tour theme, service quality of scenic spot, ticket price. Low correlation interval includes education degree, way to travel, travel purpose, age, gender, profession.Setting algorithms mtry to 4, ntree to 1000,we establish new Random forests model, and find that The new model has good classification performance Compared with original model.

Keywords/Search Tags:

Tourist consumption influencing factors, data preprocessing, SMOTE, random forest, partition between features extraction

PDF Full Text Request

Related items

1	Research On Extraction Method Of Industrial Control Network Security Situation Elements Based On Random Forest
2	Research On Random Forest Similarity Algorithm
3	Research On The Expansion And Classification Of Several Imbalanced Data Sets Based On C-SMOTE Algorithm
4	Research On The Method Of Solving Imbalanced Classification Problems Based On Random Forest Algorithm
5	Research On Analysis Method Of Influencing Factors On The Quality Of SMT Products Based On Big Data
6	Research On The Emotional And Temporal And Spatial Change Of Tourists Based On Microblogging Data
7	Analysis Of Influencing Factors And Short-Term Load Forecast Of Gas Consumption In Urban Pipeline Based On Data Mining
8	Research And Application Of Classification Technology For Unbalanced Data
9	Analysis Of Influencing Factors Of Urban Rental Price Based On Machine Learning Methods
10	Research On DDoS Attack Detection Based On Application Layer