Research On Multi-label Long Text Classification Algorithm Based On Transformer

Posted on:2024-08-26

Degree:Master

Type:Thesis

Country:China

Candidate:M J Tang

Full Text:PDF

GTID:2568306920986369

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

Text classification is a key research direction in the field of natural language processing,and it is also a classic problem in natural language processing.Text classification has a wide range of applications,such as sentiment analysis,text topic classification,user intention prediction,mail information filtering,user portrait,etc.However,different from the traditional binary classification and multi-classification problems,each independent text content often corresponds to multiple classification labels.Thus,predicting the corresponding categories of these texts becomes a multi-label classification problem,which involves the selection of multiple correct labels in the label system.At the same time,in the traditional text classification problem,long text is faced with the process of truncation,segmentation and compression.In the classification process,most of the content information is often lost,and the lost information often provides content support for the multi-label classification task of super-long text.Traditional methods can not effectively classify long text with multiple labels.Therefore,the research on multi-label classification of long text has very important application value.Aiming at the problems existing in the field of long text classification in the industry,this thesis mainly studies the following two aspects:(1)In view of the low efficiency of long text multi-label classification caused by timing problems and spatial permutation problems of traditional CNN and RNN deep learning algorithms,an improved text classification model,Transformer,is proposed,which is suitable for long text processing.Through the Segment document segmentation process,the long text statement information is segmented according to a specific length,the Attention sliding window mechanism in the attention mechanism is connected,the cyclic information processing and relative position encoding mechanism are used to process the text content,and the text classification results are output.(2)Combined with LDA implicit Dirichlet algorithm,a TRM-LDA topic classification model suitable for multi-label classification of super-long text is proposed.Combined with improved Multi-Head attention mechanism,text features are extracted with fine granularity and text is classified with higher accuracy.In this way,the document prefects can be classified with multiple labels.Finally,comparative experiments show that the proposed solution in this thesis is significantly improved in P value,R value and F1 value in the field of long text multilabel classification compared with the traditional deep learning algorithm using truncation and compression methods.

Keywords/Search Tags:

long text, multi-label, Transformer, text classification, LDA

PDF Full Text Request

Related items

1	Research On Feature Extraction Of Multi-label Text Classification
2	Algorithm Research For Chinese Text Multi-label Classification
3	Multi-label Text Classification Based On Long Short-Term Memory
4	Research On Extreme Multi-label Text Classification Based On Label Knowledge
5	Research On Text Multi-label Classification Algorithm Based On Label Correlation
6	Research On Multi-label Classification Network For Chinese Text And Noisy Labels
7	Research And Implementation On Text Classification In Vertical Domain
8	Research On Multi-label Text Classification Based On Deep Learning
9	Research And Implementation Of Multi-label Text Classification Method For Threat Extraction
10	Research On Multi-Label Text Classification Based On Deep Learning