Text Sentiment Classification Based On NB-TextCNN Combinatorial Algorithm On Imbalanced Class Samples

Posted on:2023-07-15

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Shi

Full Text:PDF

GTID:2557307070473584

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

The problem of text sentiment classification is crucial to the fields of public opinion analysis and personalized recommendation in network platforms.However,in datasets with uneven distribution of sample sentiment,traditional methods often ignore sentiment categories with a small number of samples.In this thesis,we first expounded the research status of machine learning and deep learning algorithms in text sentiment classification,especially in unbalanced samples,and introduced the relevant theories of text representation and text sentiment classifiers in detail.Second,data preprocessing was performed on 48,875 samples in the NLPCC2014 dataset.After statistical analysis,we found that there is noise in the dataset,and the emotional category samples are extremely unbalanced.In this regard,a set of sample enhancement methods was proposed.Each sample was enhanced for different times according to its category.Each enhancement was randomly selected to replace homophones,replace synonyms,delete words randomly,and copy words randomly.This method made the number of samples in each category tend to be balanced,and simulated the original sample noise to improve the robustness of the model.Then,the emotion classification combination algorithm of NBText CNN was established.Specifically,the prior probability distribution was replaced to adapt to the samples before and after enhancement to complete the NB optimization.The NB prediction results were spliced with the text features extracted by Text CNN pooling to obtain new features that fused semantics and category distributions.After weighting each element in the feature,we input it to the fully connected layer of Text CNN,and performed Softmax transformation on it to output the classification results.Finally,based on the above-mentioned sample enhancement theory and the constructed NB-Text CNN combinatorial algorithm,text sentiment classification was performed.The results show that the F1-Weighted avg of the combined algorithm studied in this thesis is 4.4% higher than that of the single Text CNN algorithm,and NB-Text CNN is better than the three basic algorithms of Text RNN,Fast Text and Transformer.Under the premise of ensuring the prediction accuracy of other emotional categories,the sample-enhanced NB-Text CNN improves the F1 score of the two emotional categories with the least samples from 0 to 0.28,and from 0.15 to 0.33,respectively.The F1 score of the other four basic algorithms also have different degrees of improvement after sample enhancement.

Keywords/Search Tags:

text sentiment classification, imbalanced class samples, Na(?)ve Bayes, TextCNN, combinatorial algorithm

PDF Full Text Request

Related items

1	Improved Naive Bayes Algorithm With Application To Text Classification
2	Chinese Text Categorization Method And Implementation
3	Short Text Topic Mining Of Hotel Comments Based On Emotional Classification
4	Research On Imbalanced Classification Model In User Churn Identification
5	Research On Negative Comments Of Online Courses For Unbalanced Data
6	Research On News Classification And Recommendation Method Of Taiyuan Education Bureau Government Affairs Big Data Platform
7	Multi-label Classification Of Company Announcement Headlines Based On TextCNN
8	A New Classification Model For Imbalanced Classification
9	Chinese Text Classification Based On Statistical Method
10	Research On The Classification Method Of Textbook Moral Items Based On Deep Learning