| The problem of text sentiment classification is crucial to the fields of public opinion analysis and personalized recommendation in network platforms.However,in datasets with uneven distribution of sample sentiment,traditional methods often ignore sentiment categories with a small number of samples.In this thesis,we first expounded the research status of machine learning and deep learning algorithms in text sentiment classification,especially in unbalanced samples,and introduced the relevant theories of text representation and text sentiment classifiers in detail.Second,data preprocessing was performed on 48,875 samples in the NLPCC2014 dataset.After statistical analysis,we found that there is noise in the dataset,and the emotional category samples are extremely unbalanced.In this regard,a set of sample enhancement methods was proposed.Each sample was enhanced for different times according to its category.Each enhancement was randomly selected to replace homophones,replace synonyms,delete words randomly,and copy words randomly.This method made the number of samples in each category tend to be balanced,and simulated the original sample noise to improve the robustness of the model.Then,the emotion classification combination algorithm of NBText CNN was established.Specifically,the prior probability distribution was replaced to adapt to the samples before and after enhancement to complete the NB optimization.The NB prediction results were spliced with the text features extracted by Text CNN pooling to obtain new features that fused semantics and category distributions.After weighting each element in the feature,we input it to the fully connected layer of Text CNN,and performed Softmax transformation on it to output the classification results.Finally,based on the above-mentioned sample enhancement theory and the constructed NB-Text CNN combinatorial algorithm,text sentiment classification was performed.The results show that the F1-Weighted avg of the combined algorithm studied in this thesis is 4.4% higher than that of the single Text CNN algorithm,and NB-Text CNN is better than the three basic algorithms of Text RNN,Fast Text and Transformer.Under the premise of ensuring the prediction accuracy of other emotional categories,the sample-enhanced NB-Text CNN improves the F1 score of the two emotional categories with the least samples from 0 to 0.28,and from 0.15 to 0.33,respectively.The F1 score of the other four basic algorithms also have different degrees of improvement after sample enhancement. |