Multi-label Document Classification Based On Hierarchical Supervision

Posted on:2019-12-09

Degree:Master

Type:Thesis

Country:China

Candidate:C Y Xie

Full Text:PDF

GTID:2428330545986959

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Text classification is a hot topic in data mining.The rapid and effective exploration of textual content from textual data and the automatic classification of texts based on the content of texts has become the mainstream of data mining.Traditional single-label document classification assumes that each document belongs to a category,and the different categories are independent of each other.However,in practical applications,a document can be bound to multiple labels,and labels can be shared between different documents.The task of multi-label classification is based on training set to obtain a document classification model,accurately locate the test samples of unknown labels to multiple categories,and more fully reflect the actual characteristics of the document.The machine learning methods available for document classification problems include decision tree models,Bayesian classification,neural network models,topic models,and support vector machines,and the text classification system is used to automatically classify unknown samples.This article focuses on the classification of multi-label documents.In the case of known multi-label text hierarchies or the ability to explore data set hierarchies,the traditional topic model-based classification model is improved.The main work of this article includes the following three parts:1)Based on the inherent hierarchical structure of the document topics,the hidden layer is introduced to propose the NLDA model.The hidden layer is a "topic-lanel" pair,and the high-layer topics and low-layer labels form a fully-connected structure through duality.2)Based on the NLDA model,the topic layer supervision was introduced to propose the NSLDA model.Our observation is that the number of topics in the document is much smaller than the number of labels,so the classification accuracy at the topic layer is much greater than the classification accuracy at the label layer.Based on the LDA model,the stable probability distribution of each document theme layer is obtained,and the probability distribution is used as input to tune the Gibbs sampling process of the NLDA model to obtain the NSLDA model.At the same time,according to the diversity of the hierarchical structure,the NSLDA model is expanded to improve the versatility of the model.3)Construct positive and negative case models for model fusion.Using the idea of ensemble learning to introduce reinforcement learning,the topic model training is divided into two(positive and negative)training models,and the label probability distribution of the prediction set is predicted respectively,and the topic probability is fused according to a certain weight to get the final probability distribution,which reduces the risk of overfitting the model.The experimental results show that the proposed NLDA model and NSLDA model have good classification effect in the dataset whose label hierarchy is known.,and the NSLDA model is better than the NLDA model.Reasonably selecting positive and negative sample training models and mixing the predicted label probability distributions will further enhance the model's classification performance.

Keywords/Search Tags:

Multi-label classification, Topic model, Supervised learning, Model fusion

PDF Full Text Request

Related items

1	Exploiting Label Relationships In Multi-label Classification
2	Topic Modeling Approaches For Supervised Document Classification
3	A Study On Weakly-Supervised Text Classification By Incorporating Neural Topic Model For Supervision Generation
4	Study On Topic Model Based Multi-label Text Classification And Stream Text Data Modeling
5	Research On Multi-Label Text Classification Methods Based On Topic Feature
6	Research On Multi-label Leaning Problems Based On Topic Model
7	Research On Multi-Label Text Classification Method Based On Deep Learning And Topic Models
8	Research And Application Of The Multi-labeled HDP Text Topic Model
9	Research On Weakly-supervised Classification Methods Based On Samples And Labels Modeling
10	Research On Deep Learning Text Classification Based On Fusion Topic Features