Font Size: a A A

Hierarchical Feature Selection Algorithm Based On Category Information

Posted on:2024-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z H ZhangFull Text:PDF
GTID:2568307064455914Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the data mining and machine learning area,there are many promising feature selection algorithms for a wide range of applications.However,traditional feature selection algorithms are facing serious challenges in the face of the increasing amount of massive data and the constantly increasing size of the data,as at the same time there are hierarchical relationships between categories in the label space of the data,making the problem further more complex.Hierarchical classification learning is an effective method for structuring the special structural relationships between categories.It uses the relationships between data categories to build a hierarchical structure,breaking down the originally complex multiple category classification problem into multiple subtasks,effectively solving the high dimensionality of the data,thereby reducing the difficulty of data modelling and greatly improving the efficiency and accuracy of the classification problem.This thesis starts the research on hierarchical feature selection,and the main research work is as follows:(1)Hierarchical feature selection algorithm based on category consistency.To address the problem of ignoring the hierarchical relationships between categories in traditional feature selection algorithms,a hierarchical structure-oriented feature selection algorithm is defined that makes full use of the sibling strategies that exist between nodes within the category hierarchy.First,the algorithm uses recursive regularization to learn common features for each internal class of a hierarchical category.Second,the output consistency between categories is analyzed by making full use of the hierarchical structure and restricting the similarity of the categories to the output labels.Finally,sparsity is learned for sample features to remove irrelevant features.The algorithm can handle data with both tree structure and directed acyclic graph structure.The experimental results show that the algorithm is effective in all evaluation metrics on the Linear Support Vector Machine(LSVM)classifier,and the classification performance is improved,proving the effectiveness of the algorithm.(2)Hierarchical feature selection algorithm combining class information constraints.In the hierarchical structure,there is a close relationship between categories.The superior class is roughly granular with respect to the junior class,while the junior class is granular with respect to the superior class.A hierarchical feature selection method combines category information constraints is proposed to deal with this problem.The basic idea is to convert the information between categories into an effective regularization that is added to the learning model to improve the performance of feature selection.First,sparsity learning is performed for each node based on the hierarchy as well as learning common features.Second,a regularization term is added based on the information constraint between categories due to the strong correlation relationship between two features in the example.The proposed algorithm achieves superior results in some metrics on the LSVM classifier,validating the effectiveness of the algorithm.
Keywords/Search Tags:hierarchical data, hierarchical feature selection, label-specific feature, category hierarchy information
PDF Full Text Request
Related items