Font Size: a A A

Applied Automated Feature Engineering And Machine Learning In Predictive Analytics

Posted on:2020-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:Chia Emmanuel TungomFull Text:PDF
GTID:2428330602468005Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Feature engineering is a tedious and time consuming step involved in the building of a machine learning pipeline.Automated machine learning termed AutoML,is growing into a field in its own right,looking to solve the complexities and hassles in feature engineering and machine learning as the demand for these systems in both industry and academia continues to grow.Feature Engineering is vital and can be the difference between success and failure in machine learning,yet there are no standard methods for it and the practice is highly domain specific.It has been regarded by experts as more of an art than a science.Machine learning and feature engineering often require in-depth understanding of the problem and data,making it difficult to automate.In recent years there has been significant work in the data science community towards automating the process.In this study,Deep Feature Synthesis(DFS)(an automated feature engineering algorithm for relational data)is used to break down features into smaller related entities and then generate new features(Deep Features)relative to a given target entity(prediction entity).The features generated are used to build machine learning prediction models using TPOT(Tree-base Pipeline Optimization Tool),XGBoost,RandomForest,KNN and DecisionTree.TPOT is an Automated Machine Learning Algorithm(AutoML)used to optimize machine learning pipelines.It is used in this study as a standalone prediction algorithm.AutoML systems are often used for complete pipeline optimization but in this study,we integrate automated feature engineering with AutoML which enhances the performance of the AutoML algorithm.The approach explained is applied to an e-commerce dataset is used to predict the basket size,repeat basket size and return time of customers which are very important to e-commerce merchants and to a Mobile phone event Dataset to predict the gender and age-group of a user.The results are compared against a baseline and show that automated feature engineering enhances the AutoML algorithm which performs comparative to XGBoost with no attention to feature selection and hyper-parameter optimization.The results show how useful and close automation of feature engineering and Machine learning is to a data science expert.
Keywords/Search Tags:Machine Learning, Predictive analytics, feature engineering, AutoML
PDF Full Text Request
Related items