Font Size: a A A

Automatic Feature Engineering In Supervised Learning

Posted on:2019-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ZhangFull Text:PDF
GTID:2428330626452115Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Feature engineering is one of the most difficult and time-consuming tasks in data mining projects,and requires strong expert knowledge.Existing feature engineering techniques tend using limited numbers of simple features and validating their approach on simple datasets,to limit computation time,but this obviously limits the benefits of feature engineering.In this paper,we propose a general Automatic Feature Engineering Machine framework or AFEM for short,which defines families of complex features.We show that this framework covers most of the existing features used in engineering techniques and allows us to efficiently generate complex feature families:in particular,we present time-based and social network-based families for rela-tional and graph datasets,as well as composition of features.We introduce the feature derivation process by introducing features one family at a time(block bottom-up)and selecting the most promising ones,thus mitigating the computation cost of top-down approaches.We validate our approach on realistic datasets,two data science competitions and a recommendation system task with social network.In the first tasks,AFEM reached ranks 15 and 12 of human teams;in the last task,it achieved 1.5%regression error reduction,compared to best results found in the literature.Furthermore,we analyze the balance between computation time and number of features/performance in the context of big data and web application:in one case study,we can reduce 2/3 computation time with only 0.2%AUC performance loss.Feature engineering(FE)is one of the most difficult and time-consuming tasks of data mining projects,and requires strong expert knowledge.It is thus significant to design a generic and automatic way to perform FE.The primary difficulties arise from the multiform information to consider,the potentially infinite number of possible features and the time cost of feature generation and evaluation.We present a framework called Light Automatic Feature Engineering Machine(LAFEM),which organizes the FE problem as a Heterogeneous Transformation Graph(HTG),then finds the optimal solution by Deep Q-Learning and Long Short Term Memory(LSTM).We compare the performance of LAFEM with several existing state-of-the-art automatic FE techniques on a large collection of 200 datasets and show that LAFEM almost always outperforms them in model accuracy and time efficiency on a large collection of datasets.
Keywords/Search Tags:Feature Engineering, Machine Learning, Data Mining, Reinforcement Learning
PDF Full Text Request
Related items