Multi-label Text Classification Based On Long Short-Term Memory

Posted on:2018-11-23

Degree:Master

Type:Thesis

Country:China

Candidate:T Xiong

Full Text:PDF

GTID:2348330515459764

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Classification is a core problem in artificial intelligence,and multi-label text classification becomes ponderable in content indexing and management when the semantics of real-world text are becoming richer so that a single text can be categorized to multiple labels.Although text classification has been widely studied,multi-label text classification is still hard,due to the complexity of multi-label text classification growing exponentially with the increase of class labels,which makes traditional algorithms not applicable.Thus,this paper studies multi-label text classification as follows:(1)After analyzing shortcomings of traditional text classification algorithm,a model based on hierarchical long short-term memory and word2vec is proposed.Features are automatically learned from training texts at sentence and document level to hierarchically generate the representation of whole document in the proposed model.(2)This paper proposes two strategies for multi-label text classification based on the proposed model.One sorts the labels based on multi-logistic regression and then applies a dynamic threshold calibration technique to get the prediction;the other uses the structural characteristics among labels to construct a label tree,and trains multiple classifiers in the label tree for joint predictions,multiple criterions are proposed for joint prediction in this strategy.(3)A series of experiments that compare the proposed model with baseline models on a number of indicators are conducted on New York Times data set,also,a series of experiments that aim to analyze the effects of different predicting criterions are conducted.The contributions of this paper are listed as follows:(1)A hierarchical long short-term model based on multi-logistic regression and dynamic threshold calibration technique is proposed,which brings significant performance improvement in the experiments compared to the baseline model(subset accuracy improvement of 38%,F1 score improvement of 23%).(2)A label tree is constructed considering the structural characteristics among labels,and classifiers for every inner node of the label tree are trained for joint prediction.Furthermore,A*search algorithm is applied in joint prediction in a creative way,which implements different prediction criterions by using different definition of edge weights in label tree.The proposed strategy continues bringing great performance improvement in the experiments compared to the former model described above(subset accuracy improvement of 12%,F1 score improvement of 2.5%).

Keywords/Search Tags:

Multi-label Text Classification, Hierarchical Long Short-Term Memory, Label Ranking, Label Tree, Optimal Path Searching

PDF Full Text Request

Related items

1	Identifying Labels From Multi-label Texts Using Deep Learning
2	Research On Multi-label Classification Algorithm Based On Label Relationship
3	Research And Application Of Hierarchical Multi-label Classification Algorithm
4	Research On Multi-label Learning And Algorithms Based On Data And Label Correlations
5	Research On Multi-label Classification Algorithms Based On Samples And Property Analysis
6	Multi-Label Learning Based On Exploiting Label Dependency
7	Research On Hierarchical Classification Based On Label Distribution
8	Research On Multi-label Classification Related Technology
9	Towards Multi-label Classification
10	Recognition Of Applicable Laws Based On Hierarchical Multi-label Classification