| With the development of the Internet,tourism websites have gradually emerged,providing people with various information such as transportation,hotel accommodation,and travel guides.At the same time,with the improvement of living standards and the rapid development of the economy,people travel for journey or business trips more frequently.These tourism websites provide people with information before they departure and are also platforms for people to give feedback.Nowadays,more and more people are willing to comment on the Internet to share product information and experience,which helps others to make better choices and contributes to the construction of the tourism websites and the improvement of the service level and goods quality.There is bigger data amount and faster speed of data accumulation in the big data era.When faced with a large number of reviews while time and energy are limited,the usefulness of the reviews becomes important to help people quickly obtain effective information.Through analysis of the influencing factors of the review usefulness,tourism websites can selectively push reviews to people or provide people with information selection functions of important influencing factors which meets different information needs and improve the information acquisition efficiency.At the same time,tourism websites may get people's trust and favor and merchants can self-improve based on feedback from useful reviews.Looking at online reviews and the hospitality industry,this paper focuses on the influencing factors of the usefulness of online hotel reviews.TripAdvisor is a world's leading travel website,providing reviews and recommendations from travellers around the world,so this paper uses Python to crawl 25,651 Chinese reviews from 11 cities and 96 hotels from August 1,2016 to August 10,2018 as research data.Firstly,the raw data is preprocessed,and 24426 valid reviews are obtained through language screening,invalid review screening,time conversion,word segmentation and deletion of stop words.Secondly,descriptive statistical analysis is done from several aspects,including frequency analysis of variables such as number of historical reviews,number of historical useful votes and traveler type,comparative analysis of monthly useful votes and monthly number of reviews,word cloud analysis,etc.Thirdly,compute variable values in multivariate regression model.For dependent variables,the review usefulness score is calculated by the weighted similarity algorithm based on Word2Vec.For independent variables,all the influencing factors are divided into three categories:reviewer features,review quantitative features and review qualitative features.Reviewer features include number of historical reviews,number of historical useful votes and traveler type.Review quantitative features include time and length.Review qualitative features include semantic score,sentiment score and photos.Use the LDA model based on TF-IDF method to analyze the review topic for semantic score and use five methods such as naive Bayes,logistic regression,random forest,K-nearest neighbor and support vector machine to do sentiment analysis for sentiment.scores.2485 reviews' emotional labels aremanually labeled for training and use five fold cross-validation.method to obtain the parameters with best AUC and accuracy for prediction.Finally,linear regression model is used to analyze the influencing factors of usefulness.The results show that,semantic analysis finds that people pay more attention to the hotel's room conditions,food supply level,executive lounge service,check-in and check-out efficiency,membership rights and hotel location and trafficconvenience;sentiment analysis finds that naive bayes,logistic regression and support vector machines are better than K-nearest neighbors and random forests.The first three methods are used for prediction and obtain 20186 positive reviews and 1755 negative reviews.The review usefulness analysis finds the type of reviewer,time,whether there are photos,negative sentiment and the semantics of reviews have a significant positive influence on the usefulness.The review length and the usefulness significantly have a "U"-type relationship.While the number of historical reviews and historical useful votes and the degree of emotional deviation are not significant.In summary,this paper makes the following suggestions for TripAdvisor and hotel.First,push the useful reviews or reviews from high-quality reviewers to people referring to the reviewer's reputation,the number of historical useful votes.Second,improve the information selection functions,so that people can selectively read reviews according to the length of the reviews,the keyword or whether there are photos.Third,optimize the page structure on the mobile phone,taking into account the information capacity and visual sense.Forth,for hotels,pay more attention to the review photos and improve the quality of the details.The innovation of this paper is to use the weighted similarity algorithm based on Word2Vec to calculate the usefulness of reviews which do not have usefiul votes,and weighted by the number of useful votes.Moreover,consider the influence of the traveler type on the review usefulness.The shortcoming is the lack of word segmentation lexicon for the hotel industry.Although the word segmentation lexicon has been supplemented,the result of word segmentation is not the best.And in the semantic analysis,the word frequency weight is considered,but there is a deficiency in the identification and substitution of synonyms.With the further study of the online review by future scholars,it is believed that better solutions will be found to solve the above problems. |