Font Size: a A A

Research On Sentiment Analysis Of Hotel Reviews Based On Stacking

Posted on:2020-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WuFull Text:PDF
GTID:2428330599453655Subject:engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,people often book hotels online.However,hotel reviews are mixed on the Internet,and it is time-consuming and laborious to rely solely on manual browsing when facing a variety of information.Using some machine learning methods to analyze the emotional categories of a large number of hotel reviews,it is not only very convenient for consumers to book hotels online,but also important for hotel businesses and Internet platforms.For the sentiment analysis of hotel reviews,the commonly used TF-IDF weighting method only pays attention to the words frequency and the number of documents of the features.At the same time,the single machine learning model often affects the classification results of the texts due to the certain defects in the classification process.Therefore,this paper improves the traditional TF-IDF weighting method,and proposes an integrated emotion classification model.The main work of this paper is as follows:(1)Pre-processing the hotel review texts.At first this paper simply cleans up the hotel review texts,including removing duplicates and meaningless characters,correctly classifying the categories of comments and so on.Then,in order to distinguish the emotional polarity words accurately in the sentiment analysis of the hotel reviews,this paper constructs a corresponding sentiment dictionary,which is composed of common existing dictionaries and hotel emotional words those are extracted manually.Finally,for the two problems of emotional words recognition and new words recognition in the word segmentation,a custom dictionary is introduced to conduct word segmentation operation on the comment texts.(2)Feature extraction is performed on the preprocessed texts by using Word2 Vec to obtain the feature vectors of the texts.Since the traditional TF-IDF weighting method ignores the degree of association between features and categories and the important influence of sentiment words on sentiment classification,this paper proposes an improved TF-IDF weighting method,which not only considers the features distribution between texts classes fully,but also increases the weight values of emotional features at the time of weighting.Finally,through experimental comparison and analysis,the improved TF-IDF weighting method is higher than traditional TF-IDF weighting method in the emotional classification accuracy and F value.(3)Using common machine learning algorithms to classify the hotel review texts.According to the classification results,Random Forest,SVM and KNN are selected as the base classifiers in Stacking.However,it turns out that comparing with the best performing Random Forest in the base classifiers,the integrated accuracy rate and F value both decrease when the experiment is based on the combined model of Stacking.In view of this situation,this paper improves the classification performance of SVM and KNN which perform relatively poorly in the base classifiers.First this paper uses the improved Adaboost integration method to enhance the classification of SVM.And then compares the two weight distribution methods of KNN,integrates KNN through Random Subspace to solve the dimension disaster problem of KNN.Finally,the effectiveness of the sentiment classification model proposed in this paper is verified by comparing the experimental results of whether the base classifiers are enhanced in Stacking.
Keywords/Search Tags:sentiment analysis, feature vector, weight calculation, machine learning, ensemble learning
PDF Full Text Request
Related items