Font Size: a A A

Research On Automated Feature Engineering Algorithms For Classification Problems Of Numerical Features

Posted on:2021-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:J Y CaiFull Text:PDF
GTID:2518306476952959Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,automated machine learning has become a new sub-field of machine learning.Every step of machine learning can be developed in the direction of automation.Among them,feature engineering is one of the difficulties in applying AI in industry,and the quality of features is the foundation of the subsequent learning models.Since the raw features rarely lead to satisfactory results,it is often necessary to perform manual feature generation to better represent the data and improve learning performance.However,this is usually tedious and task-specific work,which inspires research work related to automated feature generation.Most of the early work of automated feature generation focused on generating features through a combination of strictly pre-defined methods,making the method less scalable;later,deep learning methods based on implicit learning of higher-order feature interactions appeared,but the model lacked interpretability.To this end,we propose an automated feature construction framework Tide Kit,which can learn the high-order interactions of input features automatically,and is widely used in classification problems with numerical features and has good model interpretability.The main work of the thesis is as follows:(1)In terms of feature generation,we propose a new feature combination method based on the self-attention mechanism,which is specifically implemented in the interaction layer of the model.For each interaction layer,higher-order features are combined through the attention mechanism,and different kinds of combinations can be evaluated using the self-attention score,so the learning process is interpretable.By stacking multiple interactive layers,the different sequences of the combined raw features can be modeled,and the process is fully automated.(2)In terms of feature selection,we propose a novel feature selection method based on reinforcement learning.The feature selection process is transformed into a Markov Decision Process(MDP).Evaluate the candidate probability of each feature in parallel based on policy gradient,through iterative exploration and utilization of the generated features,within a limited number of steps to guide the feature generation of the test set with the globally optimal feature generation and selection scheme.In addition,we propose a new method based on meta-features for hot start and individual reward differentiation,and establish a dynamic automated adjustment mechanism,thus optimizing the iteration efficiency.We performed extensive experiments on eight real-world datasets.The experimental results show that our proposed method is not only better than the latest prediction methods,but also has good model interpretability.In addition,the dynamic auto-adjustment mechanism provides better convergence for the model.
Keywords/Search Tags:Numerical features, Automated feature engineering, Self-attention, Reinforcement learning
PDF Full Text Request
Related items