In multi-label text classifification(MLTC),each given document is associated with a set of correlated labels.In this paper,we study two important problems of MLTC task:weak text representation in multi-label text classification and classification task correlation modeling.We propose joint embedding of text and labels,text-label correlation-guided text representation,and label correlation learning through multi-task framework.Aiming at the problem that the text representation is not discriminative in MLTC task,this paper compares whether to introduce label information and different types of embedding methods(joint embedding and non-joint embedding)through experiments,and proposes a joint embedding strategy to implicitly capture the correlation between text-label and label-label while reducing the model’s dependence on label description information.Since capturing text-label correlation plays a key role in the acquisition of text representations,after obtaining the joint embedding of text and labels,this paper further proposes to utilize a two-stage attention mechanism(selfattention network and text-label cross-attention)network to obtain the correlation matrix to explicitly model the correlation between text and labels,and then obtain a differential weighted global text representation.Due to the poor predictive ability of models on low-frequency labels in MLTC tasks,to address this common problem,previous classifier chains and Seq2Seq models both transformed the task into a sequence prediction task and solved it by modeling label correlations.However,the above models tend to suffer from label order dependencies,label combination overfitting and error propagation issues.To avoid the above problems,this paper proposes two auxiliary label co-occurrence prediction tasks to enhance label correlation learning,strengthen the modeling of label correlation,and further alleviate the long-tail problem.This model achieves better performance on public datasets,trains faster than other Seq2Seq-based models,and fits better label combinations.Finally,we design and implement the MLTC system.Based on the model MT-LACO proposed in this paper,the input text to be classified is analyzed,the relevant labels are predicted,and the classification results are output. |