| The successful hosting of the 2022 Beijing Winter Olympics ignited the enthusiasm of Chinese people for ice and snow tourism,take the advantage of the Beijing Winter Olympics,the ski industry also ushered in a golden age,however,the quality of ski resorts in China is uneven,which also leads to many people’s skiing experience is not very good.Along with the advances in Internet technology,more and more people like to express their voices on the travel platform,under the increasingly fierce competition in the ski market,it is an urgent problem to find out tourists’ concerns about ski resorts in the massive and complex comment data,so as to continuously improve their service level and better protect tourists’ experience,which is also of great significance to promote the healthy development of China’s ski industry.Based on the related theories of text sentiment classification and theme model,this paper makes a systematic research on this issue by using natural language processing and text mining technology,taking the online travel platform ski resort tourists’ comments as the research object.The specific research work is as follows:Firstly,the data collection is completed based on web crawler technology,and the collected data is statistically analyzed in a descriptive way.A special dictionary for word segmentation and stop words in the field of ski resort is constructed,and on this basis,data preprocessing is carried out,including data cleaning,text segmentation and stop words removal,which lays a data foundation for the follow-up model construction.Secondly,in view of the shortcoming of the traditional sentiment dictionary method,which doesn’t consider the semantic information of sentences,in order to accurately classify ski resort tourists’ comment texts,six sentiment classification models of machine learning texts are constructed by Python language,trained and predicted in the established corpus of tourism field,and the actual performance of each model is comprehensively evaluated by calculating the accuracy,precision,recall,F1 and AUC values of the generated confusion matrix.The modeling results show that the Long-Short Term Memory neural network(LSTM)model has the best overall performance.Finally,based on the above,LSTM model with the best classification performance is used to classify the preprocessed texts,the classification results show that the satisfaction of ski resort tourists in China is not high.In order to further look for tourists’ concerns,the Latent Dirichlet Allocation(LDA)theme model is constructed in negative comments,the analysis of the modeling results shows that among the theme dimensions of negative comments,tourists are most concerned about experience and infrastructure.According to the conclusion of the analysis,it provides targeted suggestions for ski resort managers and contributes to the healthy development of China’s ski industry. |