Topic Awareness Model And Training Efficiency Optimization For Text Multi-label Classification

Posted on:2023-09-29

Degree:Master

Type:Thesis

Country:China

Candidate:J B Zhao

Full Text:PDF

GTID:2558306620971059

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

After entering the information age,the human society has produced and accumulated massive text data in work and life.How to classify these data accurately,and then manage them scientifically has very important practical significance.Text multi-label classification refers to the process of determining the subset of the tag according to the document content under a given tag set,which has many practical applications in real life.The existing mainstream research paradigm is mainly devoted to modeling the relationship between words in documents and labels and focuses on analyzing the correlation between labels.However,the shortcomings of insufficient use of document clues and neglect of semantic association between words and labels are common,which cannot fully describe the semantic similarity and difference between documents,reducing the effect of document representation learningTo solve the above problems,this thesis designed a topic awareness model for multilabel classification.Firstly,the existing Glove model was improved,words and corresponding labels in documents were combined and counted into the co-occurrence matrix.The embedding vectors of words and labels in the same feature space were obtained by training on the corpus used for classification.Then,based on existing deep learning models,several topic vectors initialized by topic model are introduced.The original document features modeled by deep network are mapped to different topic vectors through attention mechanism to obtain multiple fine-grained document features focusing on different topics.Finally,the fine-grained document features are fused with the original features,and then interact with the embedding vector of the label to obtain the classification result.By embedding words and labels into the same feature space,this method captures the semantic association of words and labels,generates fine-grained document features through topic awareness,makes full use of the clues of documents,and models the implicit relationship between documents and labels more comprehensively on the basis of existing methods.Experiments show that,this method can improve the precision of classification greatly.At the same time,faced with the challenge of text multi-label classification brought by the increasing volume of text data,aiming at the problems of large memory overhead and low computational efficiency when processing large-scale data,based on the existing methods.This thesis summarizes three training efficiency optimization methods: loading data into memory in batches,constructing efficient data pipeline between CPU and GPU,and multi-GPU joint training.Experimental results fully confirm that these methods can significantly improve training efficiency and reduce memory consumption,which has important practical significance.

Keywords/Search Tags:

Text multi-label classification, Embedded representation, Topic awareness, Fine-grained features

PDF Full Text Request

Related items

1	Research And Implementation Of Fine-grained Text Topic Detection Technology
2	Fine-grained News Text Classification Method
3	Analysis And Research Of Fine-Grained Image Classification Technology Based On Representation Learning
4	Research On Fine-grained Image Classification Method Based On Deep Learning
5	Study On Topic Model Based Multi-label Text Classification And Stream Text Data Modeling
6	Research On Fine-Grained Image Classification Based On Deep Learning
7	Research On Fine-grained Sentiment Classification Of Text Based On Deep Learning
8	Research On Label Denoising Method Of Network Image Dataset For Fine-grained Image Classificatio
9	Research On Multi-Label Text Classification Methods Based On Topic Feature
10	Research On Fine-grained Image Classification Based On Multi-branch Attention And Fused Multi-level Features