Font Size: a A A

Global Optimization Method For Multi-Label Fast Feature Selection

Posted on:2020-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q SunFull Text:PDF
GTID:2428330572980757Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
In recent years,multi-label learning has been widely concerned and applied in many fields.However,the feature dimension of multi-label data sets is very high and contains a lot of noise,irrelevant and redundant features.This will not only lead to huge storage and time overhead,but also bring serious disaster which called "the curse of dimensionality",making multi-label learning tasks very difficult.Therefore,how to effectively select multi-label features is an important research content in multi-label learning.Since the 1990s,many scientists have devoted themselves to the related research of feature selection.So far,hundreds of effective methods have been developed.Among many methods,the method based on information theory is an important branch and has achieved good results.However,most feature selection methods based on information theory are transformed from single-label feature selection methods,or feature selection is carried out by heuristic search strategy which is easy to fall into local optimum.Based on the above considerations,this paper proposes a new feature selection framework based on mutual information.The framework uses convex optimization strategy to replace the previous search strategy for multi-label feature selection,and attempts to use parametric model and parametric-free model for comparative experiments.The main work includes:1.To overcome the shortcomings of traditional multi-label feature selection algorithm in selecting the optimal feature subset,a new optimal subset selection method is proposed,which greatly reduces the time complexity of the algorithm.Based on the classical single label feature selection method mRMR,an improved feature selection method is proposed,which attempts to transform the previous search problem into a global optimization problem.The mRMR method is a classical single label feature selection method,in which the Chinese name is the Max Relevance Min Redundant feature selection method.This method assumes that the features of the original label space are redundant and highly correlated with the label,so searching the optimal feature subset in the label space can achieve the above principles.The purpose of feature selection,and a large number of experiments and related derivative methods have proved the validity of this hypothesis.However,the time complexity of this method and most of the search-based feature selection methods is too high,which requires a lot of computing resources.In the third chapter,this paper tries to solve this problem by using global optimization strategy,and tries to incorporate label-related information into the feature selection process.Thus,a parametric model is proposed,and the interior penalty function method is used to optimize the problem.The numerical solution is given,and the related experiments and analysis are carried out in Chapter 3.Compared with the single-label learning problem,the labels of the multi-label learning problem also contain a lot of practical information,so how to effectively utilize the labels' relationship has become the core content of the multi-label learning research in recent years.The proposed method tries to incorporate label correlation information into feature selection framework,so as to improve the performance and robustness of the algorithm.2.Although the time complexity of the proposed algorithm in Chapter 3 is much higher than that of the traditional classical algorithm,because it is a parametric model,the choice of the optimal parameters will be a great waste of computational effort.In order to solve this problem,the work of Chapter Four continues to improve the proposed algorithm on the basis of trying the work of Chapter Three.In order to further improve the robustness of the algorithm,it is an effective way to transform the parametric model into a parametric model.Therefore,in the fourth chapter,a parametric feature selection algorithm is proposed,and the interior penalty function method is also used as the solution method of the algorithm.In order to verify the validity of the algorithm,a lot of experiments and correlation analysis will be carried out in Chapter 4.4.The framework presented in this paper is incomplete in theory,but it is feasible to use analytic solution to solve global optimization in practical experiment.This method not only has very low time complexity,but also can obtain theoretical optimal solution under computable conditions.Compared with the interior penalty function method,this method has great advantages,so it is a supplementary optimization method.In this paper,-the chemical methods are briefly introduced.3.Aiming at the problem that the interior penalty function method as an approximate solution leads to the approximation of the optimal solution to the global optimal solution,this paper compares the analytical solution as a supplementary solution algorithm in Chapter 5.The analytic method and the interior point penalty function method have their own advantages and disadvantages,so the characteristics of them need to be analyzed in detail.This paper mainly uses the interior point function method as the solution method to consider its advantages of no specific requirements for data sets and strong robustness.However,the superiority of the analytical solution in time performance can not be ignored,so this paper lists it as a supplementary work in chapter 5,and compares it with the interior point penalty function method.The problem of multi-label feature was put forward in the 1990s.It has been nearly 30 years since then.During this period,many excellent research works have been produced.However,with the advent of the information age,both the number of data and the dimension of a single data will change exponentially.Therefore,previous multi-label feature selection algorithms can not be applied to today's "big" data.It is necessary to study multi-label feature selection for today's data dimension and breadth.
Keywords/Search Tags:multi-label feature selection, mutual information, convex optimization
PDF Full Text Request
Related items