Font Size: a A A

Sentiment Text Classification Research Integrating CNN And Bi-LSTM Deep Learning Algorithms

Posted on:2021-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:W L JiangFull Text:PDF
GTID:2518306095990299Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In recent years,as we are at the height of the information age,the amount of information that needs processing has grown exponentially,and text categorization is now a hot research topic.Though there exist various ways to improve the quality of text classification,as the volume of text corpuses grow,we are still faced the problems of limited text segmentation field,unbalanced data distribution resulted low classification accuracy,difficulty in labeling emotional text,and low applicability of classification algorithms.In order to solve this problem,our research starts from building domain-specific lexicons,using a dynamic programming word classification algorithm to increase domain adaptability.This resolves the problem of domain-specific words causing bad classification.In this paper,the thesaurus of the metallurgical field and the thesaurus of emotional text are constructed.In experiment,we use the dynamic programming word classification algorithm to compare with the Smallseg and Snailseg segmentation models,and applied it on 3 corpora(Metallurgical text data set,MSRA data set and Weibo_senti_100K data set).Experimental results shows segmentation model effectiveness with domain-specific lexicon.Further,based on the model we constructed,we combined K-means++,CNN(convolutional neural networks)and Bi-LSTM(bi-directional long-short term memory)techniques to build a new model(abbreviated KCBL).In this model,we applied Kmeans++ clustering on sentiments and re-sampled documents to normalize their distribution.For each document cluster we constructed end-to-end learners,thus solving the problem of uneven distribution of text data and difficulties in labelling them.To verify the performance of KCBL,we applied it on 8 corpora and compared it against7 state-of-the-art classifiers(CNN,LSTM,Bi-LSTM,Caps Net,CNN-LSTM,CNNBi-LSTM,(K-means++)+ CNN-LSTM(= KCL in short)).By 4 evaluation indexes(accuracy,recall rate,F1 score,and AUC),we found superior performance of KCBL against 7 classifiers on 8 corpuses.Moreover,statistical significance shows that KCBL is highly competitive compared with the other 7 classifiers.In summary,we conclude that KCBL outperforms state-of-the-art classifiers and is very suitable for text classification.
Keywords/Search Tags:Text categorization, domain dictionary, imbalanced dataset, unlabeled dataset, data resampling
PDF Full Text Request
Related items