Font Size: a A A

Research On Multi-label Leaning Problems Based On Topic Model

Posted on:2019-01-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y PengFull Text:PDF
GTID:1318330545475713Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In many real applications,one object can be assigned with multiple labels simul-taneously,these situations can be defined as multi-label problem.Multi-label classi-fication deals with objects having a set of class labels and each object is represented by only one single instance.Nowadays,multi-label problem has been widely used in many applications,such as text categorization,image,bioinformatics,web mining and so on.Multi label learning is still faced with many difficulties and challenges due to the particularity of its data set.One of the most common problems is the massive label output space.To alleviate this,some methods opt to exploit label correlations to reduce the output space during prediction.However,how to exploit the label correlations is a big problem.In addition,label imbalances are another problem that is prevalent in multi-label classification.Current methods of correcting for imbalance oftentimes use single-label methods,which fail to consider label correlations.To deal with these,our main works have been listed as follows:1.In this thesis,we propose a simple and efficient algorithm for multi-label classifi-cation,called Multi-Label based on Label Topic(MLLT),which aims at leaning the global correlations by using the topic model on the class labels.We regard the label set associated with each instance as a document and each class label in the label set as a word,then obtain the topic in the label space by topic model.The topics of label set which are introduced into feature space as the information of the correlations can improve the prediction.Besides,the time complexity of our algorithm is as low as BR.Extensive experiments clearly validate the effec-tiveness of the proposed approach.2.Based on the the algorithm MLLT,we proposed some extensions to make it more flexible,accurate and widely used.At first,considering that the basic idea of MLLT is that introducing label correlations among labels by label topic then reconstruct the multi-label data set.We try to introduce any state-of-the-art multi-label algorithm into our method as a base algorithm.Then MLLT has been developed as a general framework MLLTM.Most of current approaches can be applied to our framework.We can adopt state-of-the-art method as the basic method in MLLTm to make a breakthrough from the state-of-the-art per-formance.We can also adopt simple method as the basic method to improve the performance close to the state-of-the-art performance in low time cost.Secondly,we discuss that how to choose the topic number.Given an increasing sequence and select the value from small to large in turn as topic number,then iterative execute the steps of MLLTM and introduce label correlations in loop.We can see that with more topics introduced the performance is increasing in general.At last,we find that in some multi-label data sets,a large percentage of instances contain only one label.It means that it is hard to exploit the correlation from the single label instance although the correlations exist in multi-label instance.To deal with these problem we propose an external framework.In this case,we introduce a binary classification to judge each instance with single label or mul-tiple label.We add the results of the classification into the feature space as a new feature.The experiments clearly validate the effectiveness of the framework on this special data set.3.In this thesis,we do some research on class imbalances in multi-label classifi-cation.Current methods of correcting for imbalance oftentimes use single-label methods,which fail to consider label correlations.In this thesis,we introduce general frameworks that incorporate topic modeling to seamlessly address both problems.We show that these frameworks can allow even the most naive meth-ods,such as Binary Relevance,to perform similarly to state-of-the-art methods.Our frameworks can also adapt state-of-the-art methods to perform better than the methods by themselves.Furthermore,the framework is very good at dealing with the imbalance of multi label categories.The only weakness is that the cost of time is improved because of the extra training of multi-class classifiers.4.Traditional Chinese Medicine(TCM)is a new way for diagnosing Parkinson,and the data of Chinese Medicine for diagnosing Parkinson can be abstracted as a multi-label data set.We find the multi-label data set of Parkinson has the high percentage of single-label instance.Meanwhile the class is imbalance in this data set.Then we try to use the multi-label framework we have proposed to solve this problem.We compare the results of the traditional multi-label algorithm,framework MLLTC,framework MLLTCS,and framework MLLTC-IMB.It is proved by experiments that our framework has good coping modes in dealing with Parkinson data set.Finally,a good result can be achieved and it can be helpful to the doctors.At the same time some potential rules can be exploited in the mining process for the further research.
Keywords/Search Tags:multi-label learning, label correlations, topic model, class imbalance
PDF Full Text Request
Related items