Font Size: a A A

Research On Multi-label Feature Selection Algorithms Based On Normalized Cross-Covariance Operator

Posted on:2018-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:H L YuanFull Text:PDF
GTID:2348330518490379Subject:Computer technology
Abstract/Summary:PDF Full Text Request
For multi-label classification, one sample is associated with multiple labels, and the labels may overlap and be interrelated reciprocally. Multi-label data sets often contain thousands and tens of thousands of features. The existence of irrelevant and redundant features increases the computational cost and degrades the classification performance.As an effectively and widely used data preprocessing technique, multi-label feature selection can solve the problem of high-dimensional multi-label data.Normalized cross-covariance operator, which has been used for capturing dependence of variables,can calculate the correlation between the feature sets and the label sets. The higher evaluation value of NOCCO reflects the larger correlation between the two sets. At present, multi-label feature selection techniques are broadly divided into three categories: filter, wrapper and embedded methods. Focusing on filter technologies, in our thesis, we propose two multi-label feature selection methods(1)a multi-label feature selection method based on normalized cross-covariance operator and greedy search technique(BANOCCO, FONOCCO, FROCCO); (2)a multi-label feature selection method based on normalized cross-covariance operator and genetic algorithm with control strategy(CGANOCCO, FRNOCCO).For BANOCCO, FONOCCO and FRNOCCO, the training data and test data are normalized firstly, normalized cross-covariance operator evaluates the dependence between features and labels, three strategies of sequential forward selection,sequential backward selection and sorting by feature weight are applied to search for the original feature space until finding a fixed-size feature subset. In experiments,we compare them with three existing algorithms on six benchmark data sets, according to ten performance evaluation. The experimental results indicate that the feature selection algorithm proposed in this thesis can select the better quality features.For CGANOCCO, normalized cross-covariance operator evaluates the dependence between features and labels and genetic algorithm with control strategy is in control of seeking for features numbers each time until it converges to the global optimal solution. In experiments, we select two multi-label selection methods: CGAHSIC and FRHSIC, which will be compared with our proposed algorithms on eight data sets.The experimental results show that the algorithms of CGANOCCO and FRNOCCO have better classification ability.
Keywords/Search Tags:Multi-label classification, feature selection, normalized cross-co variance operator, Hilbert-Schmidt independence criterion, genetic algorithm
PDF Full Text Request
Related items