Font Size: a A A

Research And Implementation Of Anomaly Detection System In Tourism Big Data

Posted on:2021-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:J Y YuFull Text:PDF
GTID:2428330623468149Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the increase in the number of tourists in the tourism industry,the increase in the number of optional locations for tourists,and the diversification of choices,various safety issues accompanying the tourism process have become increasingly complicated.Tourism safety accidents have occurred from time to time,which has adversely affected the safety of tourists' lives and property.Therefore,the research on the detection of abnormal events in tourism has important research value and significance.Abnormal events in tourism include natural disasters,accident disasters,events of public health,and events of social security.For tourism events of social security,this thesis identifies and detects tourist complaints and negative comments using natural language processing and machine learning technologies and detects tourism anomalies(tour guide violations).This helps to assist the work of tourism regulators and provides a reliable basis for tourism regulators.First,this thesis proposes a method for identifying negative reviews of tourism based on hierarchical attention networks(HAN).The tourism supervision system related to this thesis is introduced,and then the types of tourism anomalies detected by this article are indicated.This leads to the task of tourism sentiment analysis.Before performing the classification task,the tourism complaint text preprocessing should be performed,including word segmentation,removal of stop words,and text representation.The core of the method is the hierarchical attention networks model,which has the advantage of capturing the hierarchical features of documents and has two levels of attention mechanisms applied at the word-level and sentence-level.Experiments show that compared with other commonly used methods,the method based on hierarchical attention networks achieves the best results,which shows that this method is efficient on the task of negative tourism opinion recognition.Then,this thesis proposes a method for identifying abnormal behaviors of illegal guides based on the gradient boosting decision tree algorithm(GBDT).First,illegal guides' behaviors were divided into five categories according to the existing regulations and systems.Then the construction method of the tourism word embedding model based on Word2 Vec is introduced,which uses the travel news corpus to construct the word embedding model and makes the text representation more accurate.Aiming at the problems of insufficient accuracy and long training time of existing text classification algorithms,a gradient boosting decision tree algorithm is introduced,which is combined with gradient-based one-side sampling(GOSS)and exclusive feature bundling(EFB).Experiments show that applying this method for illegal guides' behavior detection tasks has higher accuracy and takes less training time.Finally,this thesis proposes a hybrid method of complaint text augmentation based on contextual augmentation and back translation.The contextual augmentation method expands the common synonym replacement operation and introduces a context-aware word prediction module,in which the label of original text remains unchanged to match sentence additions,deletions,and changes operations.Aiming at the problem that the training corpus of illegal guides' behavior detection task is little,this method augments the complaint corpus.Experiments show that this method improves the recognition effect of the tasks,and has a better effect than the two methods separately.
Keywords/Search Tags:tourism big data, abnormal events, sentiment analysis, illegal guides' behavior detection, text data augmentation
PDF Full Text Request
Related items