Font Size: a A A

Research On Automatic Feature Engineering Algorithms For Classification Problems Of Categorical Features

Posted on:2022-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y WuFull Text:PDF
GTID:2518306740482604Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Automated machine learning aims to automatically construct a machine learning pipeline to lower the threshold of its use,and feature engineering is one of the key steps of the machine learning pipeline,with the quality of features determining the upper limits on the performance of algorithms.Traditional feature engineering relies on domain knowledge and human intervention,which is costly due to trial and error as well as much too complex.In order to automate the feature generation,many researches have adopted automatic feature construction methods based on deep learning and achieved good results.These methods capture the feature interactions by deep neural networks,and quantify the relationship between features by attention coefficients.However,this kind of attention based feature generation methods ignore the influence of useless generated features,which account for only a minority relative to the large number of candidate combinatorial features.The generated features obtained by the combination of irrelevant or redundant ones will introduce noise and degrade the performance of the model.In view of the above problems,this thesis learns from the differentiable architecture search(DARTS)where the discrete space is relaxed and the graph structure learning where an optimized graph structure and model parameters are jointly learned.Then,this thesis studies and proposes an automatic feature generation model based on differentiable architecture search and an automatic feature generation model based on graph structure learning.The main work includes:(1)To address the problem that existing feature generation methods based on deep learning easily generate redundant or irrelevant features,inspired by DARTS where the discrete search space is relaxed and the parameters are optimized by gradient descent,the problem of feature selection is transformed into the problem of structure optimization in this thesis and an automatic feature generation model based on differentiable architecture search called AFDAS is proposed.Firstly,AFDAS defines discrete structure parameters,which reflect whether features are selected in feature interactions.Then structure parameters are optimized based on the relaxation of discrete space.In addition,several interaction operators such as skip connection are introduced to improve the representation performance of feature embeddings.Next,Gumbel Softmax and the collaboration mode of interaction operators are introduced to solve the performance collapse when the search network derives the target network respectively.Finally,the effectiveness of AFDAS is verified on four real-world datasets.(2)Further,aiming at the limitation that AFDAS and attention based feature generation methods fail to make full use of the information of inter feature associations,inspired by the graph structure learning,the original tabular data is mapped into the graph-structured data in this thesis and an automatic feature generation model based on graph structure learning called AFGSL is proposed.AFGSL transforms the problem of learning the relationship between features into the problem of learning the adjacency matrix.Moreover,only the use of the global attention based interaction causes the loss of effective information,while only using the adjacency matrix based interaction requires deeper network structure.To solve this problem,AFGSL proposes to learn the policy of stacking interaction layers on top of the network by Q-learning,making the agent construct a suitable network based on the aforementioned different interactions.Finally the features make efficient use of local or global information when interacting with each other.The experimental results on four real-world datasets demonstrate the effectiveness of AFGSL.
Keywords/Search Tags:Automatic Feature Engineering, Categorical Feature, Architecture Search, Graph Structure Learning
PDF Full Text Request
Related items