Font Size: a A A

Research On Tourism Online Review Based On LDA Model

Posted on:2023-03-27Degree:MasterType:Thesis
Institution:UniversityCandidate:HATSADONG VONGSAMPHUNFull Text:PDF
GTID:2558307097475304Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Since China’s reform and opening up in the 1970 s,China’s national economy has achieved rapid and stable development,and the lives of the Chinese people have become increasingly prosperous.More and more people have gradually changed from pursuing material life to pursuing deeper quality of life transfer.As an important choice for people’s entertainment and relaxation,tourism is an important manifestation of the Chinese people’s pursuit of quality life.Therefore,tourism has also developed rapidly with people’s increasing tourism needs.With the rapid growth of travel demand,the choice of travel destination has also become a choice for tourists.There are many tourist attractions in China.According to statistics,as of the end of 2018,there were more than 30,000 scenic spots in China,of which there were more than 10,000 A-level scenic spots.It includes 259 5A scenic spots and 3034 4A scenic spots.How to quickly and accurately choose their favorite tourist attractions has also become a problem that tourists are very concerned about.Attractions in China mainly include species types: lakes(such as Wuhan East Lake,Hangzhou West Lake,Qinghai Lake,Hunan Dongjiang Lake,etc.),mountains(such as Hunan Hengshan,Henan Songshan,Anhui Huangshan,Shaanxi Huashan and Shandong Taishan,etc.),forests(such as Zhangjiajie in Hunan,Xishuangbanna in Yunnan,Shennongjia in Hubei,etc.),landscapes(such as Li River in Guilin,the Three Gorges of the Yangtze River,etc.),seaside(such as the coastal cities Hainan,Dalian,Xiamen,etc.),people usually travel according to their own preferences.Famous scenic spots under this type,most of these scenic spots are praised in poems,but I don’t know what other tourists have to say about the scenic spots.However,in recent years,China’s Internet has developed rapidly,and traditional industries have been replaced in the "Internet +" era,and the "Internet + travel" scene has also emerged in this context.The rise of the "Internet + travel" model has changed the traditional model,in which information mainly comes from people around you and tour groups.Tourists can purchase tickets for scenic spots on the Internet platform,and after visiting,they can make online comments on the overall view of the scenic spots through the Internet platform,and these comments are of great value to subsequent tourists.However,in the face of massive tourism online review text data,it is unrealistic to obtain valuable information in the reviews by manual reading.At this time,the application of text mining in the tourism industry has also developed rapidly.Text is one of the important unstructured data,and its related aspects of data mining have important applications in various scenarios such as healthcare,marketing,e-commerce media,and digital humanities.For example,extracting standardized text data from electronic medical records can quantify the diagnosis results of patients and make reasonable suggestions;extracting keywords from consumer reviews on review platforms(such as travel websites,shopping websites,etc.)By extracting and quantifying the text of a certain topic on the public social platform,you can learn the public’s views on a certain topic and grasp the relevant public opinion.These applications have huge potential for value enhancement,and to exploit these potentials,it is necessary to master systematic text data mining methods.Text mining is a branch of data mining,which takes text as the mining object and finds the hidden and potentially valuable knowledge of information structure,model and pattern from it.Text mining is involved in many fields such as information retrieval,pattern recognition,and natural language processing.Since text is the most important information carrier,making full use of text data mining technology to acquire knowledge can not only create huge commercial value and social value,but also an essential information processing tool in the process of human society transforming to information civilization.The main significance of text mining lies in the following aspects:(1)Promote the construction of information technology.Through the continuous development of IT technology,people are liberated from cumbersome data collection and statistics,and the intelligent production and service operations are realized.(2)Improve the efficiency of information utilization.Due to the variety of representation,storage and output of text data,if it cannot be effectively converted,classified and other operations,the rich information in it will not be fully utilized.Through text mining,it is possible to find out the patterns hidden in the text information,to discover the prediction information that may be ignored,and so on.(3)Improve the level of artificial intelligence.The level of artificial intelligence is mainly measured by judgment and understanding ability,decision-making thinking ability and implementation command ability.Among them,the ability to judge and understand is the most important.Currently,most information is stored in the form of text expressed in natural language.(4)Guarantee decision support.In the previous decision support system,the knowledge and rules in the knowledge base need to be established by experts or programmers,and their effectiveness depends on the experience and knowledge level of individual experts.And when the data reaches a certain scale,the workload of processing and judging this information will greatly exceed the ability of experts.The primary task of text data mining is to discover knowledge or rules that are difficult to find in data through corresponding mining techniques,which is a process of automatically acquiring knowledge.In order to better help tourists choose their favorite tourist attractions,this paper takes the famous tourist attractions in Hunan as a sample,and based on the theoretical knowledge of Chinese text mining and related technologies,the text data of online reviews of tourist attractions in the Internet platform is analyzed..As a major province in central China,Hunan Province has continued to grow in the number of tourists and tourism revenue.This is due to the abundant natural tourism resources in Hunan Province.There are Hengshan Mountain and Dongjiang Lake in the south,Zhangjiajie Natural Scenic Spot in the west,and Dongting in the north.Lake,etc.,including Yuelu Mountain and so on.Among them,Zhangjiajie Forest Park includes scenic spots such as Jinbianxi,Shili Gallery,Huangshizhai and other famous scenic spots;Nanyue Hengshan is one of the Five Mountains in China and the first batch of national key scenic spots in China;Yueyang Tower is located in Hunan Province.Yueyang City is known as the "Three Famous Buildings in the South of the Yangtze River" together with the Yellow Crane Tower in Wuhan,Hubei and the Tengwang Pavilion in Nanchang,Jiangxi;Yuelu Mountain has famous scenic spots such as Aiwan Pavilion and Yuelu Academy;Dongjiang Lake is a famous lake tourist attraction in China,and it also has a national 5A It is a "six-in-one" tourist area,which is a national-level tourist attraction,a national-level scenic spot,a national eco-tourism demonstration area,a national forest park,a national wetland park,and a national water conservancy scenic spot.This paper selects five popular tourist attractions in Hunan Province,Zhangjiajie Forest Park,Nanyue Hengshan,Yueyang Tower,Yuelu Mountain and Dongjiang Lake as sample scenic spots.In the Internet tourism platform,the user review data of the scenic spots is obtained through web crawling technology,and then Python The software auto-encoding and ROST CM6 software perform text mining analysis on these review sample data.In this paper,through the text mining of travel review data,the valuable information in the review text can be quickly and accurately obtained,and the usability of the online review text can be improved.At the same time,the complete text mining process in this paper can provide a certain reference for online comment texts in all walks of life.This paper studies the online review data of Hunan tourist attractions based on text mining technology.The main work includes the following points:(1)This paper firstly conducts an in-depth study on the development of tourism and user satisfaction.Secondly,it summarizes previous researches on consumer perceived quality and satisfaction,consumer online reviews,text sentiment analysis and text topic models.And on this basis,sum up the theoretical significance and display significance of this research.(2)This paper systematically sorts out text data processing,topic model,sentiment analysis and Python text mining technology.In terms of text data processing,it mainly includes text encoding specifications(using a unified encoding format to process text data,such as utf-8 encoding),text noise reduction(removing some insignificant words in the text by constructing a stop word dictionary,etc.),text segmentation(such as string-based word segmentation,semantic-based word segmentation,and probability-based word segmentation,etc.)and part-of-speech tagging,and text vectorized representation(mainly including TF-IDF,word2 vec and Text CNN three models).In terms of topic model,it mainly includes pLSA model and LDA model.Sentiment tendency analysis mainly includes three categories: based on subjective and objective analysis,based on machine learning algorithm and based on sentiment dictionary analysis.In terms of Python text mining,this paper sorts out the significance of the development of Python to the development of text mining and the main directions of use of text mining.(3)This paper uses the web crawler technology to obtain the research sample data of this paper through the Python self-coding method.This paper first selects five popular tourist attractions in Hunan Province,Zhangjiajie Forest Park,Nanyue Hengshan,Yueyang Tower,Yuelu Mountain and Dongjiang Lake as sample attractions.Secondly,the user comment data of these scenic spots on Ctrip is crawled through the Python web crawler technology,including the scenic spot name,comment user ID,comment time,rating,comment content,etc.After obtaining the sample data,this article connects to the MySQL database through Python,and stores the sample data in the MySQL database for later data processing.(4)This paper conducts statistical analysis on the data of 33,965 reviews obtained from 5 scenic spots.Since the 5 tourist attractions in this article are all very popular tourist attractions in Hunan,their travel reviews all show a high overall score.Overall,Dongjiang Lake has the highest score of 4.7 points,Yueyang Tower and Yuelu Mountain have a score of 4.6 points,Zhangjiajie Forest Park has a score of 4.4 points,and Nanyue Hengshan has a score of 4.3 points.All sample attractions are rated in 4.3 points or more.The time distribution of user reviews for different tourist attractions is also different.The three tourist attractions of Zhangjiajie Forest Park,Dongjiang Lake and Yueyang Tower have more tourist users from May to October,and these months are also the peak tourist season;and Yuelu Mountain Scenic Spot presents 10 Monthly tourist users increased sharply,and users in other months were relatively stable;while the monthly user reviews of Nanyue Hengshan scenic spots were relatively balanced.The length distribution of user review texts in different tourist attractions is also different.Overall,the average length of tourist review texts for each tourist attraction is between 45 and 68,with little difference.The longest average comment text length is Dongjiang Lake Scenic Spot,followed by Nanyue Hengshan Scenic Spot,with an average comment text length of 60 for these two scenic spots.(5)This paper constructs a dictionary related to tourism text mining.First,download the tourism-related thesaurus package from the Sogou cell thesaurus,and obtain a total of 3075 tourism-related terms;then extract a total of 372 related terms of attractions and service items from the official website of the research object;finally The collected terms are integrated and deduplicated,and 3329 terms are obtained,which are user-defined dictionaries.(6)This paper counts the high-frequency words in the user comment data,and analyzes the correlation measurement of the high-frequency words.First of all,by counting the high-frequency words in the comment text,we can understand the views of most tourists on the scenic spot,and secondly,by performing correlation analysis on the high-frequency words,we can know the probability of co-occurrence between the high-frequency words,and based on this,we can infer each scenic spot.overview of reviews.Through the correlation analysis of high-frequency words,the following conclusions are drawn:(1)From the perspective of user experience: on the one hand,it will be cheaper to book tickets for scenic spots through Ctrip.I am willing to recommend these scenic spots to my friends;(2)In terms of scenic spot services:firstly,users are very satisfied with the service attitude of the scenic spot;secondly,the environment of the scenic spot is very beautiful and worth visiting;secondly,the price of the scenic spot is generally affordable However,some users think that the price of Zhangjiajie Forest Park is too high;secondly,the scenic spot Yuelu Mountain is free,and it is very popular among users;secondly,users think that the supporting facilities of the scenic spot are complete and complete;The overall project is good,but some scenic projects are charged,which is consistent with our perception;secondly,you can know that you can get discounts by booking tickets online,especially through the Ctrip platform;secondly,you can know user suggestions Booking tickets in advance through platforms such as Ctrip is not only more affordable,but also very convenient;(3)In terms of the content of the scenic spots:Aiwan Pavilion is highly correlated with Yuelu Academy,Yuelu Mountain,and Hunan University,which is mainly due to these When the time is together,users usually play together;and the correlation between Dongting Lake and Yueyang Tower and Yueyang Tower is very high.This is mainly because Dongting Lake and Yueyang Tower are together.It is very beautiful to watch Dongting Lake on Yueyang Tower,and it is famous in ancient China.The poet Fan Zhongyan once wrote the famous poem "Yueyang Tower" here;Hengshan and Nanyue,Wuyue,Zhurongfeng are highly correlated,mainly because Hengshan and Taishan,Hengshan,Huashan,Huangshan are also called the five mountains.Climbing to the top is Zhurong Peak;Wulingyuan is highly correlated with Zhangjiajie and Tianzishan,mainly because Zhangjiajie is a famous tourist attraction,and users usually go to Wulingyuan and Tianzishan at the same time;Xiaodongjiang and Wuman are highly correlated,Mainly because users go to Dongjiang Lake to play,a large part of the reason is to watch the beauty of the foggy Xiaodongjiang;(4)From the perspective of scenic spots: cable car and ropeway are two high-frequency words,which are closely related to uphill and downhill.It is also the main function of cable cars and ropeways.(7)This paper conducts semantic network analysis,sentiment orientation analysis of comment text and topic model analysis of comment text on tourism user comment data.Through semantic network analysis,the review text can be further mined and analyzed,and the key information in the review text can be extracted.This paper mainly draws the following conclusions through semantic network analysis: 1)In various tourist attractions,especially mountain and river attractions,users pay more attention to supporting facilities,such as cable cars,cableways,etc.Therefore,when selling tickets on tourism platforms,it is recommended to pay more attention to the model of selling tickets and cable cars;(2)Most tourists who climb Yuelu Mountain will choose to go to Aiwan Pavilion,and most tourists who describe Yueyang Tower will also describe Dongting Lake.Therefore,it is recommended that travel websites recommend these attractions to users because they recommend these attractions at the same time;(3)Most of the descriptions of Dongjiang Lake are similar to those of Dongting Lake.The scenery is related to the scenery,so tourists who like natural scenery and beautiful scenery can choose such attractions.Through sentiment orientation analysis,we can analyze the sentiment of comment texts and understand the sentiments of tourists’ comments on scenic spots.In this process,this paper firstly constructs sentiment dictionaries,and first integrates four popular sentiment dictionaries,including Simplified Chinese of Taiwan University.Sentiment polarity dictionary(NTSUSD),Hownet sentiment dictionary(Hownet),sentiment ontology dictionary and Chinese praise and derogation dictionary,and delete words with unknown tendency that appear in both positive and negative sentiment dictionaries,and finally get a total of 17094 positive words Sentiment words and 20,138 negative sentiment words.Secondly,this paper identifies the comments with incorrect sentiment orientation classification.This paper compares the sentiment orientation analysis results with the original data scoring results,and obtains a confusion matrix.It was identified as negative reviews,and 183 of the 1894 negative reviews were identified as positive reviews.The accuracy rate of sentiment analysis reached 87.81%,and the sample data of this article was classified based on this.Finally,this paper analyzes the topic classification of user comments through the topic model,so as to obtain useful information in the comment text more intuitively.This paper analyzes the topics of positive comments and negative comments separately,and obtains the matters that tourists focus on.Finally,according to the research results,this paper puts forward some reference suggestions for the selection of tourist attractions and travel time points for tourists.
Keywords/Search Tags:Tourist attractions reviews, text mining, topic models, sentiment orientation analysis, Python language, ROSR CM6
PDF Full Text Request
Related items