| Text classification is a key research direction in the field of natural language processing,and it is also a classic problem in natural language processing.Text classification has a wide range of applications,such as sentiment analysis,text topic classification,user intention prediction,mail information filtering,user portrait,etc.However,different from the traditional binary classification and multi-classification problems,each independent text content often corresponds to multiple classification labels.Thus,predicting the corresponding categories of these texts becomes a multi-label classification problem,which involves the selection of multiple correct labels in the label system.At the same time,in the traditional text classification problem,long text is faced with the process of truncation,segmentation and compression.In the classification process,most of the content information is often lost,and the lost information often provides content support for the multi-label classification task of super-long text.Traditional methods can not effectively classify long text with multiple labels.Therefore,the research on multi-label classification of long text has very important application value.Aiming at the problems existing in the field of long text classification in the industry,this thesis mainly studies the following two aspects:(1)In view of the low efficiency of long text multi-label classification caused by timing problems and spatial permutation problems of traditional CNN and RNN deep learning algorithms,an improved text classification model,Transformer,is proposed,which is suitable for long text processing.Through the Segment document segmentation process,the long text statement information is segmented according to a specific length,the Attention sliding window mechanism in the attention mechanism is connected,the cyclic information processing and relative position encoding mechanism are used to process the text content,and the text classification results are output.(2)Combined with LDA implicit Dirichlet algorithm,a TRM-LDA topic classification model suitable for multi-label classification of super-long text is proposed.Combined with improved Multi-Head attention mechanism,text features are extracted with fine granularity and text is classified with higher accuracy.In this way,the document prefects can be classified with multiple labels.Finally,comparative experiments show that the proposed solution in this thesis is significantly improved in P value,R value and F1 value in the field of long text multilabel classification compared with the traditional deep learning algorithm using truncation and compression methods. |