Automatic Feature Engineering In Supervised Learning

Posted on:2019-07-03

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Zhang

Full Text:PDF

GTID:2428330626452115

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Feature engineering is one of the most difficult and time-consuming tasks in data mining projects,and requires strong expert knowledge.Existing feature engineering techniques tend using limited numbers of simple features and validating their approach on simple datasets,to limit computation time,but this obviously limits the benefits of feature engineering.In this paper,we propose a general Automatic Feature Engineering Machine framework or AFEM for short,which defines families of complex features.We show that this framework covers most of the existing features used in engineering techniques and allows us to efficiently generate complex feature families:in particular,we present time-based and social network-based families for rela-tional and graph datasets,as well as composition of features.We introduce the feature derivation process by introducing features one family at a time(block bottom-up)and selecting the most promising ones,thus mitigating the computation cost of top-down approaches.We validate our approach on realistic datasets,two data science competitions and a recommendation system task with social network.In the first tasks,AFEM reached ranks 15 and 12 of human teams;in the last task,it achieved 1.5%regression error reduction,compared to best results found in the literature.Furthermore,we analyze the balance between computation time and number of features/performance in the context of big data and web application:in one case study,we can reduce 2/3 computation time with only 0.2%AUC performance loss.Feature engineering(FE)is one of the most difficult and time-consuming tasks of data mining projects,and requires strong expert knowledge.It is thus significant to design a generic and automatic way to perform FE.The primary difficulties arise from the multiform information to consider,the potentially infinite number of possible features and the time cost of feature generation and evaluation.We present a framework called Light Automatic Feature Engineering Machine(LAFEM),which organizes the FE problem as a Heterogeneous Transformation Graph(HTG),then finds the optimal solution by Deep Q-Learning and Long Short Term Memory(LSTM).We compare the performance of LAFEM with several existing state-of-the-art automatic FE techniques on a large collection of 200 datasets and show that LAFEM almost always outperforms them in model accuracy and time efficiency on a large collection of datasets.

Keywords/Search Tags:

Feature Engineering, Machine Learning, Data Mining, Reinforcement Learning

PDF Full Text Request

Related items

1	Implementation Of Task Structure Utilization In Four Machine Learning Tasks
2	Research On Application Of Machine Learning And Data Mining In Bioinformatics
3	Research On Decision Distribution Modeling In Reinforcement Learning
4	Automatic Feature Engineering System For Tabular Data
5	Supervised Reinforcement Learning:methods And Applications
6	Stock Representation And Quantitative Trading Based On Machine Learning Methods
7	Analysing Correctness Of Implementations Of Machine Learning Algorithms By Machine Learning
8	SVM-Based Data Mining Technology Research
9	The Decomposition And Reconstruction Of Complex Environment In Reinforcement Learning
10	Power-Aware Traffic Engineering With Deep Reinforcement Learning