| These comments from Weibo,We-Chat,and Tiktok are characterized by short text,timeliness,and interactivity,which can easily be forwarded in a large number and can receive high attention in a short time,thus forming a hot topic of public opinion.Tourism management departments and practitioners can use text comments on tourism service quality,safety,environment,and other topics to conduct data analysis and explore popular topics,which is of great significance for mastering public opinion guidance and formulating corresponding countermeasures to improve service quality.Traditional public opinion hot spot mining methods are mostly based on long text types,which are not suitable for the small amount of information presented in short text,the single topic,and the lack of information in the upper and lower text.Biterm Topic Model is a traditional topic discovery methods based on short texts,this article focuses on the time constraints issue faced by hot topic discovery in large-scale short texts without considering big data conditions.Under the Spark computing framework,combined with the technical methods and theories of BTM model and K-means algorithm,we study parallel topic discovery methods in tourism public opinion.and mining hot topics in large scale short texts faces time constraints.The specific work includes the following points:(1)An analysis and comparison were conducted on the current research status of methods for discovering public opinion in online tourism both domestically and internationally.We have conducted in-depth research on the relevant theories of classic hot topic discovery methods such as BTM model and K-means algorithm,as well as the parallel computing framework Spark.(2)A parallel tourism public opinion hot topic discovery algorithm based on the Spark computing framework,BTM model,and K-means algorithm is proposed to address the timeliness limitation of hot topic discovery in massive short texts topic discovery methods based on BTM model which is not considering big data conditions.By parallelizing the word pair generation,document topic distribution matrix,document similarity calculation,and clustering process of tourism reviews and Weibo short text sets based on the Spark framework,the discovery time of hot topics is shortened and real-time performance is improved.At the same time,to overcome the defect of K-means algorithm being dependent on the initial cluster center,which makes it easy to fall into local extremum,the calculation results of the BTM model are used to constrain the selection of the original cluster center,thereby eliminating the selection of discrete points,cluster center distances,and other situations,thus improving the accuracy of K-means clustering method.The experimental results show that the acceleration ratio and scalability of the algorithm are greatly improved compared with the single model,which can be better applied to the application requirements of tourism public opinion hot topic discovery.(3)Finally,we have designed and implemented a visual network hot event analysis system.The system integrates user data from Weibo with blog post data to achieve visual analysis of tourism public opinion from both overall and local perspectives.The system analyzes the overall situation of Weibo hot events from three aspects: event proportion,gender proportion and event development;Using complex conditions for blog search;Analyze regional events using geographic information;And use keyword co-occurrence graph to deeply mine hot events in tourism public opinion. |