Research On Chinese Text Classification Algorithm Based On Deep Learning

Posted on:2024-08-19

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Xi

Full Text:PDF

GTID:2568307100988699

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer technology,a huge amount of text data has emerged on the Internet,and how to filter the information you need from these mixed data has become a hot topic of current research in the field of artificial intelligence.As the basic task of natural language processing in artificial intelligence,text classification needs to analyze and process the input text,and classify the text into corresponding categories according to the pre-given labels,which can be applied to various scenarios.In recent years,deep learning has become a research hotspot,and text classification models based on deep learning have strong representational and generalization capabilities,can learn richer features from data,and can handle a variety of text data of different types and lengths.In this paper,based on deep learning,we investigate relevant text classification algorithms and propose two models respectively.(1)A text classification model based on time-domain convolution and bidirectional GRU is proposed for the Chinese long text classification task.The model first randomly initializes the word vector of the original text,and then inputs into the BERT model to train the word vector to represent the text.Because the BERT model has the limitation of text length,the segmentation method is chosen to divide the long text,and then the segmentation input is used.The BERT model is augmented with the word feature vectors extracted by time domain convolution during sentence vector generation.Finally,the output of the BERT model is used as the input of the bidirectional GRU neural network to further extract text features,and the underlying attention mechanism is introduced to highlight the more influential information for the text classification task.Finally,the proposed model achieves 88.25% and 92.77%classification accuracy on the Chinese dataset of Fudan University and Sogou CS Chinese dataset,respectively,with remarkable classification results.(2)A text classification model based on ALBERT and GCN is proposed for the short Chinese text classification task.In this experiment,EDA data augmentation is first performed for the short text dataset.Then,a word vector representation of the text is performed using the Glo Ve model,and this initialization method makes fuller use of the corpus.After that,the initialized word vectors are input into the pre-trained ALBERT model for feature word vector representation.On the other hand,a text graph is constructed based on the text data,and then the feature word vector representation and the text graph output from ALBERT are input into the GCN model for training to obtain the hidden state vector in the last layer.Next,the hidden state vector is combined with the text features captured by ALBERT to generate final features to predict the category of a given text using a multi-headed attention mechanism.Finally,the model achieves 91.82% and 93.67% classification accuracy on the Today’s headlines text dataset and the THUCNews Chinese dataset,respectively,with remarkable classification results.

Keywords/Search Tags:

deep learning, text classification, pre-trained model, neural network, attention mechanism

PDF Full Text Request

Related items

1	Research On Deep Learning Text Classification Method Based On BERT Model
2	Research On Text Classification Model Based On Deep Learning And Attention Mechanism
3	Research On Text Classification Model Based On BGRU And Self-Attention Mechanism
4	Research On Classification Of News Text Based On Deep Learning
5	Research On Text Classification Method Based On Deep Learning And Attention Mechanism
6	Text Representation And Classification Based On Deep Learning With Improved Attention Mechanism
7	Research On Chinese Text Classification Algorithm Based On Deep Learning
8	Research On Long Text Classification Algorithm Via Multi-model Fusion With Attention Mechanism
9	Research On Multi-Modal Cyberbullying Detection Based On Deep Learning
10	Classification Of News Text Based On Deep Learning And Its Application