Font Size: a A A

Research Of Imbalanced Text Tendency Classification For Network Public Opinion Based On Three-way Decisions

Posted on:2021-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z C WanFull Text:PDF
GTID:2428330614458471Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous development of Internet technology and online media platforms in the new media era,network public opinion has spread widely and rapidly in society.There are two main characteristics of text-type online public opinion,on the one hand is that they have strong tendency,and the other hand is that the distribution of tendency is imbalanced.Imbalanced text tendency classification has become the key technology to solve this problem,machine learning-based imbalanced text tendency classification can improve the performance of the classifier to some extent,but there will be a defect that the feature dimension is too high when facing massive and the feature words with high contribution to the category cannot be effectively selected.Therefore,some researches about feature selection algorithm based on the idea of three-way decisions were carried out in this paper,and a new feature selection algorithm TWD-FS was proposed.In addition,in terms of deep learning,a multi-channel model fusion classification method was proposed in this paper,which can fully extract the deep semantic information of the text.The main contents of this paper as follows:1.According to the distribution of emotional features and samples in the imbalanced texts,an imbalanced text feature selection algorithm TWD-FS based on three-way decisions were proposed in this paper.Firstly,the feature word sets were generated by two supervised feature selection methods;Secondly,in order to reduce the number of feature words and reduce the feature dimension,the algorithm combined two supervised feature selection methods;Finally,this method decreased the imbalance of sentiment features by combining positive and negative sentiment features explicitly.The experimental results on the SOCC news review dataset showed that the TWD-FS algorithm proposed in this paper could effectively improve the performance of imbalanced text tendency classification.2.The advantages and disadvantages between several deep learning models were comprehensively considered in this paper,as well as the characteristics of imbalanced text distribution,a new model multi-channel TR-LSTM-CNN is designed in the idea of model fusion to improve the performance of the algorithm.Firstly,the multi-channel random under-sampling method is used to construct multiple sets of samples with a balanced distribution of sentiment tendencies.Secondly,the deep semantic information of the entire text is learned by Transformer to generate the global semantic vector of the text.Thirdly,the local semantic vector of the text is generated by using the LSTM model and the CNN model.Finally,these two parts are spliced together as the final semantic vector.The superiority of the multi-channel TR-LSTM-CNN model proposed in this paper has been validated in comparison of multiple deep learning models on multiple sets of news comment datasets.
Keywords/Search Tags:imbalanced text, feature selection, three-way decisions, deep learning, model fusion
PDF Full Text Request
Related items