Font Size: a A A

Short Texts Sentiment Classification Model Based On Large-Scale Word Frequency Features

Posted on:2020-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:2518305972964709Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of social media,user-generated content(UGC)has grown explosively.Most of these contents with specific emotional information appear in the form of short text,mainly represent users' opinions on things or behaviors.If we can use these large amounts of data to mine valuable emotional information by effective methods,it will be significant for individuals,enterprises,government,state and society to formulate policies and rules.The era of big data has come,such huge data is an opportunity for the research of sentiment classification,but it also brings great challenges to sentiment clasification.These challenges are mainly manifested in the high-dimension and sparseness of short text features and the explosion of dimensionality when faced with high-dimensional features.Uncertainty in feature selection process has great influence on the stability of the sentiment classifier.When faced with large-scale data,dictionaries or knowledge bases with smaller volume have the characteristics of limited feature coverage and poor cross-domain applicability.In this paper,through the introduction of related short text sentiment classification methods,the problems of related sentiment classification methods are discussed.Aiming at the existing problems,based on Occam's razor principle,large number theorem and TF-IDF idea,this paper proposes a new method of extracting word frequency features from large-scale tagged corpus,and designs a short text sentiment classification model,named Large Scale Whole Word Frequency Feature Model(LSWWFFM).Compared with traditional short text methods,this model simplifies the feature selection process,incorporates all text features into the model.The model not only reduces the uncertainties brought about by the selection of features but also improves the coverage of features.The experiment results show that the model proposed in this paper is less sensitive to feature dimension,and the influence of the number of corpus on the efficiency of the model is decreasing marginally.At the same time,with the increasing number of corpus,the accuracy of the model is also increasing,which shows that the model could achieve good performance in large data environment.Compared with five stable classifiers,including Naive Bayesian,Logical Regression,Support Vector Machine,Random Forest and Neural Network,the validity of Large Scale Whole Word Frequency Feature Model is proved.At the same time,the classification results of hotel reviews and takeaway reviews prove that the model proposed in this paper has relatively strong cross-domain applicability and good generalization performance.
Keywords/Search Tags:Short text, Text mining, Sentiment analysis, Big data, Opinion mining
PDF Full Text Request
Related items