Font Size: a A A

Optimization Of Nearest Neighbor Preserving Feature Selection In Multi-label Classification

Posted on:2021-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:X XiaoFull Text:PDF
GTID:2518306200453204Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the development of Internet technology,a large number of multi-label data have appeared in applications such as text classification,image and video topic annotation,biological gene function prediction,music sentiment classification,etc.The effective classification of these data has become a research hotspot in the field of pattern recognition.The interdependence between class labels and attribute features of multilabel data is more complicated,resulting in the problem of less than ideal classification accuracy and high complexity.At the same time,a large number of redundant and irrelevant features in multi-label data directly affect the subsequent classification performance and even increase the complexity.Multi-label feature selection is a relatively important part of the classification task and effectively solves the overall performance.Aiming at the two aspects of feature selection and classification of multi-label data,this paper proposes a neighbor preservation feature selection/multi-label classification algorithm(NPFS/MLC).The algorithm is divided into two parts,the feature selection stage: the neighbor preservation feature selection sub-algorithm(NPFS)first constructs a similarity matrix based on feature subspace and label space,establishes a similarity maintenance expression based on the similarity matrix,and then linearly extends similarity The performance-preserving formula obtains the neighbor-preserving formula,and calculates the neighbor preservation(NP)score to evaluate the feature subset.Finally,the greedy search selects the important feature subset.Classification stage: Multi-label classification sub-algorithm(MLC)is used for classification training.The algorithm constructs similarity functions for instances,and uses decision function weighted calculation or learning threshold function to predict instance class label sets.Using two sub-algorithms to process multi-label classification,you can select important features of the original data and then perform subsequent classification process to effectively improve the overall performance.In order to verify the effectiveness of the algorithm,two multi-label data sets of different sizes,bibtex and mediamill in Mulan,were selected on the Python platform for simulation verification.First,the proposed NPFS is compared with PMU and MDMR,and the classification performance of the data processed by different feature selection algorithms in the classification algorithm MLKNN is compared;secondly,the data after feature selection using NPFS is used for different classification algorithms MLC,MLKNN and Rank SVM performance;finally verify the NPFS,PMU and MDMR feature selection algorithm applied to multi-label classification classification time complexity.Simulation results show that NPFS has improved in five commonly used multi-label performance indicators,indicating that the NPFS algorithm can better improve classification performance;MLC is used in multi-label classification,MLC performance is better than traditional algorithms Rank SVM,MLKNN,for categories The performance optimization of uneven large-scale data sets is more obvious;NPFS combined with MLC reduces the classification complexity.The NPFS/MLC proposed in this paper is applied to multi-standard classification,and the time complexity is effectively reduced under the premise of ensuring high classification accuracy.
Keywords/Search Tags:Multi-label classification, Feature selection, Neighbor preservation, NPFS/MLC
PDF Full Text Request
Related items