Font Size: a A A

Research On Sentiment Classification Of Chinese Short Texts About Hotel Reviews

Posted on:2019-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:H Y XuFull Text:PDF
GTID:2438330566490775Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
As a special text classification problem,emotion classification is mainly focused on Chinese word segmentation,text representation and feature extraction.In recent years,researchers have made a lot of research on the above issues and achieved many good results,but there are still many problems that need to be solved.For example,for Chinese word segmentation,researchers have put forward many effective methods in terms of new word discovery,unregistered word recognition,etc.However,word segmentation also leads to the loss of emotional information expressed by virtual words,and there is no good solution;Word2vec generated word vector contains semantic information but it lacks the expression of word emotion information.How to add emotion information to word vector is the most important research content at present;In the text,there is no expression related to the theme.The traditional feature selection method is not applicable here.How to effectively dry it requires further study.Based on those previous studies,in this article we have done the following work:First,improve segmentation effect: The existing word segmentation method for the processing of virtual words does not apply to the study of emotional classification,because the combination of virtual words and real words without emotion can also express the emotional tendency,segmentation will split this combination of units and causes the loss of this part of emotional information.To solve this problem,this paper uses the “jieba word segmentation” as a basic word segmentation tool,draws on the characteristics of the N-gram language model,and uses the How-Net sentiment dictionary to extract common combinations of training units from the review corpus of the hotel domain to build a custom word segmentation dictionary.Then use the custom word segmentation dictionary as a supplement to segment the text again.Second,construct Emotional Word Vector: Word2 vec taps potential semantic associations between words and generates word vectors from large-scale corpus,but the word vector does not contain the emotional information of words.To solve this problem,this paper assumes that all words carry emotional information and are distributed in both positive and negative emotional spaces.Based on this,an emotion weight calculation method is proposed.The original word vector is weighted and modified by the emotion weight to obtain the emotion word vector.Third,proposed The Attribute Matching De-Noising Algorithm: For the text outside the theme of noise,this paper analyzes the form of noise,proposes an attribute matchingto dry algorithm.The algorithm first constructs an attribute dictionary based on the LDA topic model,then the text is sliced and matched by the attribute dictionary.The unmatched segment is taken as a noise to be deleted,thereby completing the de-noising process of the text.Fourth,the comparison experiment: This paper designed four experiments to verify the above three improved programs and algorithms.The experiment uses the jieba word segmentation tool to segment the word,Word2 vec trains the original word vector,and uses LSTM model to classify.Experimental results show that adding a custom word segmentation dictionary increases the accuracy of sentiment classification from 87.15% to88.15%;On the basis of improving word segmentation,the construction of emotional word vectors increases the accuracy of sentiment classification from 88.15% to 92.05%;After further adding the attribute matching algorithm,the accuracy of sentiment classification increased from 92.05% to 92.55%.Experimental results verify the effectiveness of the corresponding improved methods and algorithms.
Keywords/Search Tags:Chinese word segmentation, Emotional word vector, Emotional weighting, Attribute matching de-noising
PDF Full Text Request
Related items