Font Size: a A A

Research On Multi-Label Learning Based On Label-Specific Features And Label Correlations

Posted on:2020-08-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:W WengFull Text:PDF
GTID:1488306020967079Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Traditional classification tasks assume that an instance can only assign a single label from a limited set of labels.In real-world applications,however,An instance often has complex semantics and needs to be described with multiple labels simultaneously.For example,a image can contain objects such as "sea","bridge" and"pedestrian" at the same time;one document has both the keywords "economy" and"expo".Multi-label classification is an important research field in machine learning,whose task is to predict multiple correct labels for instances.Nowadays,multi-label classification has been widely applied in bioinformatics,Web mining,information retrieval,personalized recommendation,social network and other fields.The multi-label problem needs to describe multiple semantics of real things,and the number of its features often reaches hundreds or even thousands.High dimensional feature space often brings a lot of adverse effects to classification algorithms,such as greater computational burden,model overfitting and reduced classification effect.Therefore,feature reduction for obtaining low-dimensional features with strong classification ability has been one of the most popular research fields in multi-label learning.Further,considering that each label expresses specific semantics and often is supposed to possess specific features of its own,we call these features "label-specific features".For example,in automatic image annotation,texture-based features might be useful in discriminating "desert" and "non-desert"images.The goal of label-spcific feature extraction is to establish a corresponding low-dimensional feature space for each label,so as to establish an efficient classification model based on it.On the other hand,in the multi-label problem,the labels often do not appear independently and have interdependent relations,which brings great challenges to the multi-label classification.Therefore,how to use the correlations among labels to improve the classification effect has become another hot issue of multi-label algorithm.Anyhow,label-specific feature extraction and label correlations utilization are strong impetus to promote the research of multi-label classification in recent years.In this paper,we consider them together and propose a series of classification algorithms,the main research work and innovations are shown as follows:1.Aimng at the problem that existing algorithms about label-specific features usually negnect label correlation,an algorithm named LF-LPLC is proposed based on the label-specific features and local pairwise label corelations.Firstly,the original feature space is transformed into the low-dimensional label-specific feature space through cluster analysis,so that each label has its own feature expression.Then,the k-nearest neighbor techniques are used to explore the local pairewise label correlations.Based on these correlations,a method of extending the label-specific features of each label by oversampling is proposed for the first time.The experimental results verify the validity of the proposed algorithm,which shows that the performance of multi-label classifiers can be further improved by extending the label-specific features via utilizing label correlations.2.Existing algorithms using feature selection to extract label-specific features often assume that label-specific features are sparse,that is,those features come from a small subset of the original feature space.Sparse label-specific features are not always suitable for building efficient classification models.In this case,We propose the NSLSF algorithm for non-sparse label-specific feature extraction and classification with quadratic programming and linear regression technologies.The algorithm first converts the original logical label into numeric ones by constructing a quadratic programming with label correlations to convey more semantic information,and then obtains the label specific features by using the linear regression technique in the numeric label space.Based on these label-specific features,the binary classification model can be learned for each label.In addition,the obtained linear regression parameters can also be directly used for classification.Therefore,NSLSF is not only a label specific feature extraction algorithm,but also a classification algorithm.A large number of experiments show that NSLSF is very effective as a non-sparse lablel specific feature extraction algorithm and classification algorithm.3.In order to solve the problem that traditional stacking BR introduces noise and redundant features while using label corellations,an effective two-layer stacking BR algorithm SMBPO is proposed,in which pareto optimal technology is used to select label-specific features and these features are used to expand the feature space of the example.The key problem of stacking based BR is how to choose the first layer of predictive results to expand the original feature space.SMBPO first evaluates the correlations between labels and obtains the correspongding evaluation matrix.Then,it adopts Pareto optimal technology to select the label-specific features from the first layer BR and extends the original feature space with them for the training of the second layer BR.The effectiveness of SMBPO is verified by experimental results on multiple multi-label benchmark data sets.4.Traditional classifier chain algorithms exist some problems such as no optimization of label order,redundancy and noise in original and new features.In order to solve these problems,the algorithm LSF-CC is proposed based on label-specific features.In this algorithm,the correlations among feature and label are evaluated first,then the order of each label on the chain is determined according to the correlations.For each label on the chain,we select the label-specific features from the original feature and the previous labels in the chain respectively according to the correlations,and then train the binary classifier based on these label-specific features.A large number of experimental results show that the proposed method can obviously improve the performance of the multi-label classifier chains.
Keywords/Search Tags:Supervised Learning, Multi-Label Classification, Label Correlations, Label-Specific Features, Feature Transformation, Feature Selection
PDF Full Text Request
Related items