Font Size: a A A

Research And Implementation Of Short Text Sentiment Analysis Based On Social Theory And Imbalanced Oversampling

Posted on:2020-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2428330596981798Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Sentiment analysis of short texts refers to the mining of perspectives on short texts rich in emotions.Sentiment analysis of short texts is an important branch of the field of Natural language processing(NLP).Its purpose is to classify emotions in short texts that are subjective into positive and negative,or more granular.The land is divided into a variety of emotional categories such as positive,negative and neutral.Emotional analysis of short texts plays an important role in e-government,public opinion monitoring,and personalized recommendation.Short text sentiment analysis based on social theory is one of the important topics in text sentiment analysis.Twitter is the most widely used source of short text data in social media.The twitter data contains not only complex social relationships between users,but also short text information for user comments.These short text messages reflect the user's comments on a topic on the one hand,and rich emotional information on the other hand.At present,the emotional research of short text generally only discovers the simple friend relationship between users,fails to dig deeper into the attention and attention of the users,and emotional transmission,and ignores the category imbalance in the dataset which may have an effect on the ture emotional tendency of the text.In this paper,the following three aspects of exploratory research are carried out to the issues discussed above.1.Combine statistical knowledge with the SentiWordNet sentiment dictionary to construct a new Statistical emotional lexicon method(SELM).To dig deeper sentimental transmission,mark users as stars or regular users based on the number of fans.At the same time,a social relationship impact score is calculated based on the ratio of the number of other users currently concerned by the user to the number of fans of the user.Use this impact score together with the SentiWordNet sentiment dictionary to calculate emotional scores for short texts such as tweets.Using SELM's scoring method,social relationships between users can be incorporated into traditional dictionary-based sentiment analysis methods.Compared with the traditional emotion-based dictionary-based method,the SELM scoring method proposed in this paper improves the classification accuracy to a certain extent.2.Use the Synthetic minority oversampling technique(SMOTE)to address the category imbalances in the publicly available Health care reform(HCR)dataset of US.Sociological approach to handling noisy and short texts(SANT)was trained on the supplemented data set,and SANT was improved to propose ESANT(Enhance SANT).Unlike SANT,when modeling "information-information relationships," this article enhances the social relationships between users to express deeper sentimental transmission Experiments show that ESANT can improve the classification of SANT after processing the data set.The improved method ESANT proposed in this paper can express the emotional influence between users more clearly,and thus more realistically judge the emotional tendency of short text.Compared with the traditional machine learning-based sentiment analysis method,there is a significant improvement in the classification effect.3.Combining the SELM scoring method proposed in this paper,the HCR data set is divided into the deterministic set and the uncertain set,and the deterministic set is for training the ESANT model and the sentiment analysis is performed for the tweets in the uncertain set.Experiments show that combined with the SELM scoring method and the ESANT model,the classification effect can be further improved.
Keywords/Search Tags:short text, sentiment analysis, sentimental transmission, sentiment scoring method, oversampling
PDF Full Text Request
Related items