Multi-label Text Categorization Of High Dimensional And Sparse Data Based On Ensemble Learning

Posted on:2019-04-30

Degree:Master

Type:Thesis

Country:China

Candidate:L Cheng

Full Text:PDF

GTID:2428330590465513

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Nowadays,the human beings have entered an era of information explosion,and need to find their truly requirement from the complex information.Multi-label learning is such a technology,which can lead people to their needs and provide great convenience for their life and production.And the research of multi-label learning has become a hot topic in the field of data mining and machine learning.A sample in multi-label classification is represented by a single instance while associate with a set of labels contrast to single label classification.So usually multi-label classification requires a more compound classification model.With the in-depth study of multi-label classification,researchers begin to pay more and more attention to mining the association between labels in order to improve the classification performance.Text categorization is an important field in multi-label classification.Its data is often multidimensional and sparse.It's easily overfitting when training model directly using this kind of data.As we know,ensemble learning is an effective way to control model overfitting.It uses different strategies to combine a group of weak learner to produce better performance than the best single learner.In view of this,this research studies these problems.For the "dimension disaster" in the text data,it is necessary to reduce the dimension of the text space in order to decrease the complexity of the model and improve the classification performance.Therefore,an ensemble model based-on samples' rules has been proposed.It learns the mapping relationship between features and labels of one instance to train the base learner.As the data is sparse,the base learner is easy to train.This label space of base learner has already implied a label correlations inherent in the sample,so a LP method has been used to train the base learner based on the label correlations.In order to improve the performance of ensemble model,we assign a vector of weights for each base classifier,each dimension of the vector represents the support degree of base classifiers on the corresponding label,then we establish a regressive model to learn the weights.We designed an artificial intelligent judge system for the artificial intelligent judge scene in real life.And we realized of each module function.Contract experiments are conducted in several multi-label text data to verify the effectiveness of the model this paper proposed.According to the experiment results,the model can effectively deal with the high dimensional and sparse text data.

Keywords/Search Tags:

multi-label, ensemble, label correlation, high dimensional and sparse

PDF Full Text Request

Related items

1	Text Categorization Of High Dimensional Imbalanced Data Based On Depth Label Correlation Mining
2	Research On Multi-Label Classification Algorithms And Their Applications
3	Research On The Multi-label Lassification Methods With The Label Embedding And Structure Information
4	Research On The Multi-label Feature Selection And Classification Methods With The Label Correlations
5	Research On Several Issues Of Multi-Label Feature Representation
6	Research On Multi-label KNN By Exploitin Label Correlation
7	Multi-Label Learning With Label Correlation Based On Rbf Network
8	Multi-label Classification Research Based On Label-specific Features And Label Correlation
9	Research On Multi-label Classification Algorithms Based On Samples And Property Analysis
10	Research On Multi-label Learning And Algorithms Based On Data And Label Correlations