Font Size: a A A

Research Of Automatic Feature Engineering And Parameter Adjustment Algorithm

Posted on:2019-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2348330569995772Subject:Engineering
Abstract/Summary:PDF Full Text Request
The traditional process of data analysis usually involves these steps,data acquisition,data cleaning,data sampling and preprocessing,feature transformation,feature construction,feature selection,and model selection and training,model evaluation.Feature transformation,feature construction and feature selection are generally referred as feature engineering.Feature engineering aims to extract important features from the raw data,which are used for training the model.It plays a quite important role in the performance of the data mining.However,this process often requires human involvement,and the final effect depends to a large extent on the intuition of the data engineer and the experience of the relevant field experts.In addition,data mining algorithms are also important for the data mining.They learn from the features and discovery hidden patterns.The performance of the data mining models vary greatly under the different values of the hyperparameters.Hence,it is necessary to carefully turn the hyperparameters.Hyperparameter turning also requires human participation and the final result highly depends on the experience of data engineers.To sum up,traditional data mining processes rely too much on data engineers' personal experience and intuition.However,with the development of Internet and information technology in recent years,the world has entered the era of big data.more and more fields are in urgent need of experienced data scientists,and the number of data scientists in different fields and training speed have not keep up with the scale of large data development.The goal of this thesis is to automatize the process of data mining,especially feature engineering and hyperparameters turning.The thesis is mainly divided into two parts: first,the linear feature combination algorithm based on AdaBoost and the nonlinear feature combination algorithm based on gradient boosting tree are adopted to realize the automatic generation of features for structured data;second,Bayesian optimization method based on Gauss process is applied to automatically adjust the hyperparameters of data mining algorithms.
Keywords/Search Tags:data mining, feature engineering, parameter adjustment, boost, Bayesian optimization
PDF Full Text Request
Related items