Research On Chinese Text Classification Based On BERT And Graph Convolutio

Posted on:2024-07-10

Degree:Master

Type:Thesis

Country:China

Candidate:Z J Zhao

Full Text:PDF

GTID:2568307106977389

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

Text classification plays a significant role in natural language processing and is a research area of great importance.It has been found to be valuable in many fields,including agriculture,business,law,and social science.The sheer volume of texts makes it increasingly difficult for users to distinguish between them,highlighting the research value of achieving effective text categorization.This paper focuses on two sub-tasks of Chinese short and long text classification,addressing issues related to intricate semantic data,scant text characteristics,and uneven distribution of Chinese texts.The paper aims to execute the following work.This thesis proposes a solution to address the incompleteness of sentence structure,noise information,and sparse features in Chinese short texts.The proposed short text classification model utilizes multi-level semantic fusion,which effectively extracts key semantic information,including word features and sentence features,from short texts.The initial semantic features are extracted using the BERT model,and the first layer Bi-GRU network combines with the word-level self-attention mechanism to increase the weight of textual information containing crucial semantics.The second layer Bi-GRU network completes the fusion of features at the word vector and sentence vector levels,with the aid of the sentence-level self-attention mechanism.The CNN maximum pooling strategy is then used to screen and predict the category probability of optimal features.Additionally,the loss function was modified to screen out hardto-classify samples by adding thresholds for continuous training in the model and adaptive adjustment factors were included in the backpropagation process to accelerate the model fitting by adaptively decaying the learning rate according to the gradient values.After running multiple experiments,the study found that this model can improve the accuracy of Chinese short text classification,leading to a significant improvement in the overall classification effect.This thesis proposes a long text classification model that addresses the challenges of processing highly complex and unevenly distributed text features.The model is based on TFGCN and hierarchical GRU,which effectively extract text features containing global structural information and local temporal information.Firstly,a topic fusion algorithm is introduced to generate topic nodes.Implied topic words in the LDA model are fused with keywords identified by the TFC algorithm to create Converged theme nodes.These nodes are added to the text graph to create a heterogeneous graph convolution model.By propagating theme information via graph convolution,the model can alleviate issues related to complex semantic cases and feature unevenness.Secondly,multiple attention mechanisms are used to set corresponding weight calculation methods for different types of nodes.The model captures the influence of neighbor nodes on the central node,thereby improving the effective aggregation of node information.Finally,the hierarchical-level GRU module is used to extract local semantic information from long text data at three levels.Utilizing a self-attention mechanism in combination with Bi-GRU at the word,sentence,and segment levels,the model addresses the issue of missing local semantic information in the heterogeneous graph convolution model.Comparative experiments,ablation experiments,and parameter selection experiments confirm that the proposed model improves the accuracy of Chinese long text classification.

Keywords/Search Tags:

Text classification, Self-attention mechanisms, Recurrent neural network, Graph convolutional network

PDF Full Text Request

Related items

1	Research On Classification Of News Text Based On Deep Learning
2	Research On Text Classification Method Based On Graph Convolutional Neural Network
3	Text Classification Algorithm Based On Graph Convolutional Neural Network
4	Research On Text Classification Model Based On Deep Neural Network
5	Research On Text Classification Based On Graph Convolutional Neural Network
6	Text Classification Research Based On Deep Neural Network And Attention Mechanism
7	Research On Text Classification Algorithm Based On Mixed Convolution
8	Research On Text Classification Model Based On BGRU And Self-Attention Mechanism
9	Research On Text Classification Method Based On Graph God Network
10	Text Representation And Classification Based On Deep Learning With Improved Attention Mechanism