With the rapid development of information technology,data presents the feature of high dimensionality.Feature selection,as one of the effective methods for data dimensionality reduction,has the advantages of retaining original features,reducing computational costs and improving task efficiency.Feature selection needs to solve multiple objectives including maximizing classification accuracy and minimizing the number of features,which is a multiobjective optimization problem.However,the multi-objective optimization based on the Pareto dominance mechanism covers less of the edge part of the Pareto frontier.Multi-objective optimization based on the decomposition mechanism decomposes the objective space into multiple sub-problems,which provide selection pressure separately to improve the diversity of solutions.To balance the selection pressure,retain more edge solutions and improve the poor diversity in discrete Pareto frontier types,this thesis proposes two different decompositionbased feature selection methods for multi-objective particle swarm optimization,the main contents are as follows:(1)To improve the truncated selection of solutions,balance the selection pressure of particles in the multi-objective optimization feature selection process and obtain a set of multiobjective solutions with better distributivity,a multi-objective particle swarm optimization feature selection algorithm based on PBI decomposition cyclic penalty and the oppositionbased learning(MOPSO-CPBIOBL-FS)is proposed.A strategy of cyclic penalized boundary interaction is proposed to address the problem of poor distributivity and convergence due to truncation of the solution.By further dividing the subspaces,the searchability is enhanced by setting loose parameters in each subspace in the first half-cycle to improve the convergence,and the distributivity is improved in the second half-cycle by setting strict parameters in each subspace to maintain the current convergence while improving the distributivity.Secondly,the cumulative number of individual optimal not updated is used to identify search particles that fall into local optimum,and the backward learning is combined with the local particle feature information in the archive to propose two types of opposition-based learning to reduce the number of features or improve the classification accuracy to assist particles to jump out of local optimum.Experiments on eight benchmark datasets show that compared with existing algorithms,MOPSO-CPBIOBL-FS has better performance,especially in high-dimensional datasets that can retain more solutions in edge regions and better convergence in central regions.(2)To improve the performance of the algorithm in discrete Pareto frontier type problems,a PBI decomposition-based spatial position penalty and hybrid perturbation multi-objective particle swarm optimization feature selection algorithm(HPMOPSO-SPPBI-FS)is proposed.To address the problem of reduced diversity in discrete Pareto frontier type,the weight vector is divided into central weight vector and edge weight vector,different penalty term strategies are set for the central and edge regions of the central weight vector to improve the convergence,and different penalty terms are set for both sides of the edge weight vector to improve the diversity of solutions.To improve the search performance of the multi-objective particle swarm optimization algorithm in high-dimensional data sets,a mixed perturbation strategy is used for the search particles.For the search particles that are not caught in the local optimum,a multistage normal random perturbation is adopted;for the search particles that are caught in the local optimum,a larger perturbation is applied to the search particles combined with the feature information in the archive,and opposition-based learning is used for its leader particles using the archive information to guide the search particles jump out of the local optimum.After experiments,it is shown that the feature subset obtained by HPMOPSO-SPPBI-FS on eight benchmark datasets and six gene expression profile datasets has its advantages,especially in high-dimensional datasets with more uniformly distributed multi-objective solutions. |