Font Size: a A A

Multi-label Text Categorization Of High Dimensional And Sparse Data Based On Ensemble Learning

Posted on:2019-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:L ChengFull Text:PDF
GTID:2428330590465513Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays,the human beings have entered an era of information explosion,and need to find their truly requirement from the complex information.Multi-label learning is such a technology,which can lead people to their needs and provide great convenience for their life and production.And the research of multi-label learning has become a hot topic in the field of data mining and machine learning.A sample in multi-label classification is represented by a single instance while associate with a set of labels contrast to single label classification.So usually multi-label classification requires a more compound classification model.With the in-depth study of multi-label classification,researchers begin to pay more and more attention to mining the association between labels in order to improve the classification performance.Text categorization is an important field in multi-label classification.Its data is often multidimensional and sparse.It's easily overfitting when training model directly using this kind of data.As we know,ensemble learning is an effective way to control model overfitting.It uses different strategies to combine a group of weak learner to produce better performance than the best single learner.In view of this,this research studies these problems.For the "dimension disaster" in the text data,it is necessary to reduce the dimension of the text space in order to decrease the complexity of the model and improve the classification performance.Therefore,an ensemble model based-on samples' rules has been proposed.It learns the mapping relationship between features and labels of one instance to train the base learner.As the data is sparse,the base learner is easy to train.This label space of base learner has already implied a label correlations inherent in the sample,so a LP method has been used to train the base learner based on the label correlations.In order to improve the performance of ensemble model,we assign a vector of weights for each base classifier,each dimension of the vector represents the support degree of base classifiers on the corresponding label,then we establish a regressive model to learn the weights.We designed an artificial intelligent judge system for the artificial intelligent judge scene in real life.And we realized of each module function.Contract experiments are conducted in several multi-label text data to verify the effectiveness of the model this paper proposed.According to the experiment results,the model can effectively deal with the high dimensional and sparse text data.
Keywords/Search Tags:multi-label, ensemble, label correlation, high dimensional and sparse
PDF Full Text Request
Related items