Font Size: a A A

Multi-label Text Classification Based On Long Short-Term Memory

Posted on:2018-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:T XiongFull Text:PDF
GTID:2348330515459764Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Classification is a core problem in artificial intelligence,and multi-label text classification becomes ponderable in content indexing and management when the semantics of real-world text are becoming richer so that a single text can be categorized to multiple labels.Although text classification has been widely studied,multi-label text classification is still hard,due to the complexity of multi-label text classification growing exponentially with the increase of class labels,which makes traditional algorithms not applicable.Thus,this paper studies multi-label text classification as follows:(1)After analyzing shortcomings of traditional text classification algorithm,a model based on hierarchical long short-term memory and word2vec is proposed.Features are automatically learned from training texts at sentence and document level to hierarchically generate the representation of whole document in the proposed model.(2)This paper proposes two strategies for multi-label text classification based on the proposed model.One sorts the labels based on multi-logistic regression and then applies a dynamic threshold calibration technique to get the prediction;the other uses the structural characteristics among labels to construct a label tree,and trains multiple classifiers in the label tree for joint predictions,multiple criterions are proposed for joint prediction in this strategy.(3)A series of experiments that compare the proposed model with baseline models on a number of indicators are conducted on New York Times data set,also,a series of experiments that aim to analyze the effects of different predicting criterions are conducted.The contributions of this paper are listed as follows:(1)A hierarchical long short-term model based on multi-logistic regression and dynamic threshold calibration technique is proposed,which brings significant performance improvement in the experiments compared to the baseline model(subset accuracy improvement of 38%,F1 score improvement of 23%).(2)A label tree is constructed considering the structural characteristics among labels,and classifiers for every inner node of the label tree are trained for joint prediction.Furthermore,A*search algorithm is applied in joint prediction in a creative way,which implements different prediction criterions by using different definition of edge weights in label tree.The proposed strategy continues bringing great performance improvement in the experiments compared to the former model described above(subset accuracy improvement of 12%,F1 score improvement of 2.5%).
Keywords/Search Tags:Multi-label Text Classification, Hierarchical Long Short-Term Memory, Label Ranking, Label Tree, Optimal Path Searching
PDF Full Text Request
Related items