Font Size: a A A

Multi-Label Classification Algorithm Based On Partial Least Squares Regression

Posted on:2014-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:Q D RenFull Text:PDF
GTID:2268330425452500Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In modern times, as the rapid development of computer hardware and software technology, much powerful and reasonable storage media comes forth. The great boost of the storage capacity leads to the amazing progress of the database technology. A large number of databases are used to manage the huge data on the storage media. But the useful knowledge hidden in the huge data can not be discovered, without the powerful management tool. In order to effectively apply this knowledge to science research, fraud detection, production management, market analysis and engineering programming etc., the data mining technology for uncovering the knowledge have had a great development.Classification is an important content for data analysis in data mining research. It can build a classifier from the predetermined data set to predict the unlabeled data. In traditional classification, as each instance has only one label, it belongs to single-label classification. However, in many real applications, as each instance simultaneously has a set of labels, then it belongs to multi-label classification. In recent years, the wide existence and application value of multi-label data has attracted the great interests of researchers.With the continuous research of the multi-label classification, different types of learning methods have been used to multi-label classification problems, as a result, a large number of multi-label classification algorithms have been proposed which have solved many different kinds of real application problems. For instance, in multivariate statistical analysis, the method of canonical correlation analysis which can research the relationship between two sets of multivariate variables has been successfully used to multi-label classification problems. In the same way, in multivariate statistical analysis, partial least squares regression can generalize the features of multiple linear regression and principal component analysis, and also can predict a set of multivariate variables from another set of multivariate variables. Partial least squares regression is the extension of least squares regression, and has been applied to many aspects of chemistry at first. Nowadays, partial least squares regression has wide application in economy, water conservancy, environment protection and electric power etc., and gains an effective result. In classification, partial least squares regression has been used as a dimension reduction tool combined with other classification methods and also has been directly used as a single-label classification method. But partial least squares regression has not been applied to multi-label classification directly till now.In this thesis, multi-label classification based on partial least squares regression is studied, and the contents are as follows:(1)Partial least squares regression can relate two sets of multivariate variables, and also can predict the set of dependent multivariate variables from the set of independent multivariate variables, as a result, partial least squares regression is applied to multi-label classification, and the multi-label classification algorithm based on partial least squares regression is proposed. At first, a partial least squares regression model which is based on nonlinear iterative partial least squares algorithm is built; and then the data is trained to gain a multi-label classification model through partial least squares regression, after that, the multi-label classification model is used to predict the testing data set.(2)The multi-label classification algorithm based on partial least squares regression has been simulated on real-world multi-label data sets. In order to demonstrate the excellent performance, this multi-label classification algorithm has been compared with other multi-label classification algorithms. In the experiment, ten-fold cross-validation has been adopted, and the performance evaluation criteria of multi-label classification also have been used. The experimental results show that the multi-label classification algorithm based on partial least squares regression is significantly competitive to other multi-label classification algorithms.
Keywords/Search Tags:multi-label classification, partial least squares regression, performance evaluation criterion, data mining
PDF Full Text Request
Related items