Font Size: a A A

Research On Multi-Label Text Classification Methods Based On Topic Feature

Posted on:2021-07-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:W S ChenFull Text:PDF
GTID:1368330632960582Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The automatic text classification refers to the process of assigning one or more labels to unknown texts based on training text with labels.With the rapid increase in the amount of document information and the diversification of text content.the single-label classification technology has been difficult to meet the needs of people for text classification.Multi-label text classification has become an important research area in natural language processing.This paper intends to further deepen and expand the research work of text classification methods,focusing on the multi-label text classification problem from the label text classification feature extraction,multi-label text classification method and the uncertainty of multi-label text classification results.Feature extraction is a fundamental and critical part in multi-label text classification tasks.The traditional feature extraction methods cannot be effectively applied to multi-label text classification when it is difficult to obtain high-quality annotated text.In this paper,a deep topic feature extraction model combining unsupervised learning and supervised learning is proposed.The model combines the global information features of the document set with the context information features inside the document.Feature extraction of multi-label text classification is achieved by combining global feature representation of the document with local feature representation.This feature extraction method effectively improves the performance of multi-label text classification.In the multi-label text classification task,the relationship between the label and the label is not completely independent of each other and usually has a strong correlation.Especially when the number of category labels is large,the size of the output space will increase exponentially with the increase of the label,which will seriously affect the performance of multi-label text classification.Aiming at the problem of label correlation in multi-label text classification tasks,this paper proposes a multi-label text classification method based on encoder-decoder and deep topic feature extraction.Based on the encoder-decoder model,the encoder network uses deep topic feature extraction model to obtain the semantic encoding vector with deep semantic features of the text,the decoder network regards the task of multi?label text classification as a sequence generation process,and draws into attention mechanism to highlight the influence of key input on output,which effectively alleviates the problem of label correlation in multi-label text classification.The deep learning model has achieved excellent results in the multi-label text classification task.However,the existence of some problems including the noise and label missing in the text data,the distribution difference between the training data and the test data and so on,makes the general uncertainty in the multi-label text classification task.This requires natural language processing techniques to have the ability to model and reason for uncertainty.Aiming at the problem of uncertainty in the multi-label text classification tasks,this paper proposes a multi-label text classification uncertainty quantification model based on deep topic features,modeling multi-label text classification task from data and model aspects.The uncertainty measurement of multi-label text classification task is given,which can effectively deal with the uncertainty problem of multi-label text classification.
Keywords/Search Tags:multi-label text classification, topic model, encoder-decoder model, bayesian neural network, uncertainty quantification
PDF Full Text Request
Related items