Font Size: a A A

Research On Sentiment Analysis Method Of Microblogging Related To Banks

Posted on:2019-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y LiFull Text:PDF
GTID:2348330542991131Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
As a typical social network application,the impact of microblogging has gradually increased in our daily life,which has also attracted the interest of many scholars.Huge microblogging users can produce a lot of information every day,so analyzing the emotion behind these information has great commercial value and social value.Therefore,sentiment analysis for microblogging has become a hot research topic.With the feature of the short text,the sentiment analysis of microblogging involves multiple core links,especially feature selection,feature extraction and domain knowledge differences,which directly affect the accuracy of sentiment analysis.In this paper,we focus on the microblogging text related to bank business.In view of the lack of field applicability caused by the basic Stoplist,and the lack of word location distribution information and semantic relations of context caused by traditional TF-IDF,this paper studies the text sentiment analysis method by proposing a bank-stoplist and a hybrid algorithm based on LSA and improved TF-IDF to solve the problem.This paper is generally arranged as follows:(1)This paper proposes to set up a bank-stoplist.The use of the Stoplist is mainly to improve the retrieval efficiency and to save time and space cost in the process of information retrieval.However,different fields have different requirements on the content of the Stoplist,and none of the existing Stoplist has the field applicability.Based on the basic Stoplist,this paper uses term frequency and document frequency to supplement it and uses the method of sentiment dictionary to delete some of the emotional words in it.The experimental results show that the bank-stoplist is better than traditional Stoplist in terms of performance and content,which can greatly reduce the meaningless words in the text,thus reducing the text noise and improving the accuracy of the selection of feature items.(2)A hybrid algorithm based on LSA and improved TF-IDF is proposed.The traditional TF-IDF mainly selects feature items by the term frequency.The disadvantage of this method is that it only focuses on mathematical calculation,ignoring the distribution of words within and between classes and the semantic relationship between contexts.In order to solve those problems,this paper first improves the formula of IDF,and introduces the distribution of feature items within and between classes to solve the problem of location distribution of words.Then,by introducing latent semantic analysis,the problem of TF-IDF ignoring the semantic relationship is solved by identifying the similarity between words.(3)The simulation comparison experiment verifies the performance of the Stoplist and the improved hybrid algorithm constructed in this paper.The extracted features are applied to four kinds of classifiers,such as naive Bias,logistic regression,Libsvm and Liblinear,to training models.And the methods proposed in this paper are validated from the accuracy,recall and F value.The results show that the accuracy,recall and F-value of affective classification improves about 1%with the use of the bank-stoplist,and about 3%after using the improved hybrid algorithm in this paper.
Keywords/Search Tags:microblogging, sentiment analysis, LSA, TF-IDF, Stoplist, eigenvectors
PDF Full Text Request
Related items