Font Size: a A A

Research On Multi-objective Optimization For Multi-label Feature Selection Methods

Posted on:2023-05-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:J SunFull Text:PDF
GTID:1528306905990709Subject:Software engineering
Abstract/Summary:
In the past ten years,information technology has developed rapidly,and the Internet in different fields has generated a large amount of data.The effective processing of data can promote enterprise decision-making,product optimization,and drive the development of artificial intelligence technology.A great deal of data contains regular information,and researchers use different technical means to mine it.However,the nature of the data itself faces many challenges to achieve this goal.For example,the collection of data is affected by different reasons resulting in noisy data or missing data,non-numerical structured data is difficult to handle and high-dimensional features contain complex relationships,etc.The accuracy of the obtained results can be seriously affected if the unprocessed large-scale data is directly input into the machine learning method.Feature selection is an important tool for pre-processing data to detect relevant features and filter out irrelevant and redundant features.This process not only simplifies the data scale,but also helps improve the performance of the machine learning,the solution accuracy,running speed and generalization ability.Multi-label data is a special type of research data object in supervised learning.A sample is related with several labels at the same time to fully describe the label information.Compared with single-label data,it is more difficult to mine the pattern knowledge in such data,especially for data with high-dimensional,fuzzy and noise problems,the role of feature selection is extremely important.At present,solving the multi-label feature selection problem faces many challenges,such as multi-directional interactions between features,huge search space,correlation between features and multiple labels,and the complexity of multi-label classification evaluation.In response to the above challenges,this paper regards the multi-label feature selection problem as a multi-objective optimization problem and uses the efficient global search mechanism of the multi-objective evolutionary algorithm.This paper starts with the chromosome encoding,the design of the core operation operators,the construction of the objective function,etc.,and combines the theoretical basis of information theory and game theory to study different types of feature selection algorithms for multi-label data and provide effective technical means for reducing data dimensionality.The main research contents and innovations are:(1)A multi-label feature selection algorithm based on dual particle swarm optimization is proposed.We construct a dual particle swarm optimization framework,analyze the relevance between features and label sets and the redundancy between features based on mutual information theory,and assign two objectives with conflicting relationships to two particle swarms for optimization.In view of the insufficient global optimization capability of the binary particle swarm algorithm,a hybrid ring-global topology structure is studied to increase the search ability and diversity of the traditional particle swarm algorithm and avoid falling into the local optimum.An improved crowding distance maintenance mechanism is proposed to maintain the non-dominated solutions in the archive to obtain a set of Pareto optimal solutions.Finally,the overall optimization framework of the algorithm is given.(2)A multi-label feature selection algorithm based on Shapley value is proposed.This method aims to improve the accuracy of multi-label classification,and designs objective functions based on the multi-label classification results.The idea of cooperative game theory is integrated with feature selection.The feature is regarded as the player participating in the game.The purpose of the game is to enhance the overall obtained benefit.The Shapley value is used to calculate the marginal contribution of the feature,which is conducive to identifying relevant features,redundant features and useless features.The adaptive crossover operator and mutation operator based on Shapley value are proposed to balance global search and local search.Experiments are conducted to verify the effectiveness of feature subset and the multi-objective evolution algorithm on different metrics.(3)A multi-label feature selection algorithm based on many-objective optimization is proposed.From the perspective of multi-label classification,the multi-label feature selection problem can be regarded as a many-objective combinatorial optimization problem.The manyobjective evolutionary algorithm is used to solve this problem,and the comprehensiveness of the evaluation results is fully considered.To avoid the rapid increase in the proportion of nondominated solutions as the number of objectives increases,the NSGA III algorithm combined with two archives is proposed.The archives CA and DA focus on improving the convergence and diversity of the algorithm,respectively.The presentation of convergence information increases the selection pressure on non-dominated solutions and improves the quality of the selected non-dominated solutions.The crossover and mutation operators based on the label frequency difference are proposed to avoid the generation of duplicate individuals.Because the optimization objectives of CA and DA are inconsistent,different evolution strategies and archive maintenance strategies are designed for the two archives.CA uses a convergent information maintenance strategy,and DA uses a niche diversity maintenance strategy.The information transmission of excellent individuals increases the communication between the two populations.(4)A novel combined multi-label feature selection algorithm is proposed.Improving the classification accuracy and search speed of feature subsets are important research goals of feature selection,and it is difficult to achieve both goals at the same time.This work combines the advantages of filter and wrapper feature selection algorithms on a single goal and proposes a combined filter-wrapper multi-label feature selection algorithm.A distance measurement method for multi-label data is proposed in the filter model,which aims to maximize inter-class distance and minimize intra-class distance.The optimization objective of the wrapper model is to maximize Average Precision and minimize Hamming loss.This part of the work is to reduce the calculation time while ensuring the classification accuracy.The experimental results verify the feasibility and effectiveness of this combination approach.The multi-label feature selection problem is a complex optimization problem,and the multilabel nature makes it more difficult than the traditional feature selection problem.First,not only the variables are relevant in the decision space,but also a single variable may be associated with multiple labels,which increases the complexity of solving optimization problems.Second,the high dimensionality of the objective space increases the difficulty of balancing the diversity,convergence and complexity of the algorithm.Facing the current research difficulties,this paper proposes four multi-objective multi-label feature selection algorithms from four aspects: dual population optimization,feature contribution analysis,considering multiple evaluation criteria and reducing computational overhead.The above research results can provide corresponding solutions when facing different performance requirements.
Keywords/Search Tags:Feature selection, Data preprocessing, Multi-Label data classification, Multi-objective evolution, Combinatorial optimization
Related items