Sentiment Text Classification Research Integrating CNN And Bi-LSTM Deep Learning Algorithms

Posted on:2021-01-06

Degree:Master

Type:Thesis

Country:China

Candidate:W L Jiang

Full Text:PDF

GTID:2518306095990299

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

In recent years,as we are at the height of the information age,the amount of information that needs processing has grown exponentially,and text categorization is now a hot research topic.Though there exist various ways to improve the quality of text classification,as the volume of text corpuses grow,we are still faced the problems of limited text segmentation field,unbalanced data distribution resulted low classification accuracy,difficulty in labeling emotional text,and low applicability of classification algorithms.In order to solve this problem,our research starts from building domain-specific lexicons,using a dynamic programming word classification algorithm to increase domain adaptability.This resolves the problem of domain-specific words causing bad classification.In this paper,the thesaurus of the metallurgical field and the thesaurus of emotional text are constructed.In experiment,we use the dynamic programming word classification algorithm to compare with the Smallseg and Snailseg segmentation models,and applied it on 3 corpora(Metallurgical text data set,MSRA data set and Weibo＿senti＿100K data set).Experimental results shows segmentation model effectiveness with domain-specific lexicon.Further,based on the model we constructed,we combined K-means++,CNN(convolutional neural networks)and Bi-LSTM(bi-directional long-short term memory)techniques to build a new model(abbreviated KCBL).In this model,we applied Kmeans++ clustering on sentiments and re-sampled documents to normalize their distribution.For each document cluster we constructed end-to-end learners,thus solving the problem of uneven distribution of text data and difficulties in labelling them.To verify the performance of KCBL,we applied it on 8 corpora and compared it against7 state-of-the-art classifiers(CNN,LSTM,Bi-LSTM,Caps Net,CNN-LSTM,CNNBi-LSTM,(K-means++)+ CNN-LSTM(= KCL in short)).By 4 evaluation indexes(accuracy,recall rate,F1 score,and AUC),we found superior performance of KCBL against 7 classifiers on 8 corpuses.Moreover,statistical significance shows that KCBL is highly competitive compared with the other 7 classifiers.In summary,we conclude that KCBL outperforms state-of-the-art classifiers and is very suitable for text classification.

Keywords/Search Tags:

Text categorization, domain dictionary, imbalanced dataset, unlabeled dataset, data resampling

PDF Full Text Request

Related items

1	Research On Text Classification Model And Algorithm For Small Dataset
2	Research On Imbalanced Dataset Classification Based On Oversampling Technique
3	Research On Optimization Algorithm For Dataset Covering Problem
4	Research On Ensemble Learning Approaches To Imbalanced Data Sets
5	User Complaint Prediction System Based On The KPI Dataset From IPTV Set-Top Box
6	Research And Application Of Imbalanced Dataset Classification Prediction Algorithm
7	Research On Automatic Synthesis Algorithm And Detection Model Of Carton Dataset For Domain Generalization
8	Research On Algorithm To Intrusion Detection Classification Based On Imbalanced Dataset And Decision Tree
9	Research On Classification Algorithm For Imbalanced Data
10	Research On Classification Of Imbalanced Dataset Based On Generative Adversarial Networks