Short Texts Sentiment Classification Model Based On Large-Scale Word Frequency Features

Posted on:2020-09-20

Degree:Master

Type:Thesis

Country:China

Candidate:B Liu

Full Text:PDF

GTID:2518305972964709

Subject:Management Science and Engineering

Abstract/Summary:

With the development of social media,user-generated content(UGC)has grown explosively.Most of these contents with specific emotional information appear in the form of short text,mainly represent users’ opinions on things or behaviors.If we can use these large amounts of data to mine valuable emotional information by effective methods,it will be significant for individuals,enterprises,government,state and society to formulate policies and rules.The era of big data has come,such huge data is an opportunity for the research of sentiment classification,but it also brings great challenges to sentiment clasification.These challenges are mainly manifested in the high-dimension and sparseness of short text features and the explosion of dimensionality when faced with high-dimensional features.Uncertainty in feature selection process has great influence on the stability of the sentiment classifier.When faced with large-scale data,dictionaries or knowledge bases with smaller volume have the characteristics of limited feature coverage and poor cross-domain applicability.In this paper,through the introduction of related short text sentiment classification methods,the problems of related sentiment classification methods are discussed.Aiming at the existing problems,based on Occam’s razor principle,large number theorem and TF-IDF idea,this paper proposes a new method of extracting word frequency features from large-scale tagged corpus,and designs a short text sentiment classification model,named Large Scale Whole Word Frequency Feature Model(LSWWFFM).Compared with traditional short text methods,this model simplifies the feature selection process,incorporates all text features into the model.The model not only reduces the uncertainties brought about by the selection of features but also improves the coverage of features.The experiment results show that the model proposed in this paper is less sensitive to feature dimension,and the influence of the number of corpus on the efficiency of the model is decreasing marginally.At the same time,with the increasing number of corpus,the accuracy of the model is also increasing,which shows that the model could achieve good performance in large data environment.Compared with five stable classifiers,including Naive Bayesian,Logical Regression,Support Vector Machine,Random Forest and Neural Network,the validity of Large Scale Whole Word Frequency Feature Model is proved.At the same time,the classification results of hotel reviews and takeaway reviews prove that the model proposed in this paper has relatively strong cross-domain applicability and good generalization performance.

Keywords/Search Tags:

Short text, Text mining, Sentiment analysis, Big data, Opinion mining

Related items

1	Research On The Opinion Mining And Hidden Sentiment Inclination For Web Text
2	Research On Approximate Text Analysis Based Opinion Mining
3	Research On Key Problems In WEB Text Mining
4	Research On Short Text Sentiment Analysis And Its Applications
5	Research And Development Of A Collection And Opinion Mining System For Online Comments
6	Research And Implementation Of Text Mining System For Big Data Accurate Investment Promotion
7	Research And Implementation Of Fine-grained Opinion Mining System For Word-of-mouth Monitoring
8	The Research And Implementation Of Massive Short Message Mining Technology
9	Research And Application Of Sentiment Mining Model Based On Text Analysis
10	Opinion Mining Based Sentiment Analysis For Online Products Reviews Research And Application